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ABSTRACT 


Several forms for constructing novel physics-informed neural-networks (PINNs) for the solution of partial- 
differential-algebraic equations (PDAEs) based on derivative operator splitting are proposed, using the nonlinear 
Kirchhoff rod as a prototype for demonstration. 


The present work is a natural extension of our review paper in [1] aiming at both experts and first-time learn- 
ers of both deep learning and PINN frameworks, among which the open-source DeepXDE (DDE) [2] is likely the 
most well documented framework with many examples. Yet, we encountered some pathological problems (time 
shift, amplification, static solutions) and proposed novel methods to resolve them. 


Among these novel methods are the PDE forms, which evolve from the lower-level form with fewer un- 
known dependent variables (e.g., displacements, slope, finite extension) to higher-level form with more dependent 
variables (e.g., forces, moments, momenta, etc.), in addition to those from lower-level forms. Traditionally, the 
highest-level form, the balance-of-momenta form, is the starting point for (hand) deriving the lowest-level form 
through a tedious (and error prone) process of successive substitutions. The next step in a finite element method 
is to discretize the lowest-level form upon forming a weak form and linearization with appropriate interpolation 
functions, followed by their implementation in a code and testing. The time-consuming tedium in all of these 
steps could be bypassed by applying the proposed novel PINN directly to the highest-level form. 


We also developed a script based on JAX, the High Performance Array Computing. For the axial motion 
of elastic bar, while our JAX script did not show the pathological problems of DDE-T (DDE with TensorFlow 
backend), it is slower than DDE-T. Moreover, that DDE-T itself being more efficient in higher-level form than in 
lower-level form makes working directly with higher-level form even more attractive in addition to the advantages 
mentioned further above. 


Since coming up with an appropriate learning-rate schedule for a good solution is more art than science, we 
systematically codified in detail our experience running optimization through a normalization/standardization of 
the network-training process so readers can reproduce our results. 
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1 Introduction 


As a prototype of partial-differential-algebraic equations (PDAEs) for novel physics-informed neural net- 
work (PINN) formulations, the geometrically-exact rod with no shear is considered here, starting with the 
assumption often attributed to Kirchhoff (or Kirchhoff-Clebsch, or Kirchhoff-Love, or Kirchhoff-Clebsch- 
Love; see the review in [3] [4] and the references therein): Plane cross section initially perpendicular to the 
undeformed centroidal line remains perpendicular to the deformed centroidal line.' 


Our formulation’s starting point is similar to that in [4], then takes on a different direction as we are 
aiming at developing a finite deformable rod, without introducing approximations, culminating in a system 
of nonlinear partial differential equations, complemented by algebraic expressions with derivatives (AEDs), 
i.e., a PDAE system of the form 


Bi (w(X,t),t) = 0, w(X,t) = {Ruy Hoy, Okra}, {Ofuj, Huy, Auj},uj] » 82) 


where gl” is a nonlinear differential operator, operating on u(X,t), which collects the unknown dependent 
variables {u, } and their partial derivatives, with {k, i,j, p,q} being indices to be defined later in the paper. 

An example of a PDAE system is the motion of the Kirchhoff-rod Form-1 (Section 3.2) Eq. (41), 
Eq. (42), Eq. (36), Eq. (43); these equations form an implicit system of nonlinear PDEs in terms of the 
four dependent variables {w,v,a@}, &. While it is possible to eliminate two dependent variables {a@, & } by 


'Rven though this assumption is part of the “Kirchhoff-Love hypotheses” in contemporary literature, neither Kirchhoff, 

nor Love, made these hypotheses [3], in which the author wrote “Major and minor authors alike have found the 
arguments of KIRCHHOFF not persuasive.” The method of derivation in the 1992 review paper [3] based on the 
first Piola-Kirchhoff stress tensor closely resembles, however, the derivation in Simo (1982) [5], Simo, Heljmstad & 
Taylor (1984) [6], and Simo (1985) [7]. 
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substituting Eq. (36) and Eq. (43) into Eq. (41) and Eq. (42), the resulting equations, which can be put in 
generic form as, 


U(X, t) = F(y(X,t),¢) , (1) 
where the partial space derivatives are included in the nonlinear operator F, become much more complex. 


After deep learning achieved a sharp decrease in image classification error in 2012, and then by 2015 
surpassed human performance [1], research in AI was revived after a long “winter” and took on a burgeon- 
ing turn, especially in deep learning. Following this trend, as a method to solve PDEs without a need for 
discretization as in traditional methods (such as finite-difference, finite-element, boundary-element meth- 
ods), physics-informed neural network (PINN) was reintroduced [8] [9] after a long dormant period since 
first proposed in the late 1990s [10] [11]. PINN is given a key role in “Multi-physics & multi-scale model- 
ing,” the first of the nine motifs in the “Simulation Intelligence” roadmap proposed in [12]. For nonlinear 
problems, it is recently reported that PINN is 37 times faster than the traditional finite-difference method 
[13]. Since PINN converts all problems, even linear ones, to a nonlinear optimization problem, the more a 
problem is nonlinear, the better for the application of PINN. 


This present work is a natural extension of our review paper in [1] aiming at both experts and first- 
time learners of both deep learning and PINN frameworks, among which the open-source DeepXDE (DDE) 
[2] is likely the most well documented framework with many examples and a forum for discussion among 
users. So naturally, we chose to implement our novel PINN formulations in DDE with TensorFlow backend, 
abbreviated as DDE-T. Yet, we encountered some pathological problems (time shift, amplification, static 
solutions, as exemplified through the motion of an elastic bar) and propose novel methods to resolve them. 


Among these novel PINN formulations are the different PDE forms, obtained by splitting the derivative 
operators, evolving from the lower-level form with fewer unknown dependent variables (e.g., displacements, 
slope, finite extension; see the Kirchhoff-rod Form 1 in Section 3.2) to higher-level form with more depen- 
dent variables (e.g., forces, moments, momenta, etc.), in addition to those from lower-level forms (see 
Kirchhoff-rod Form 4 in Section 3.6). 


Traditionally, the highest-level form, the balance-of-momenta form, is the starting point for (hand) 
deriving the lowest-level form through a tedious (and error prone) process of successive substitutions. The 
next step in a finite element method is to discretize the lowest-level form, after forming the weak form and 
constructing a linearization, with appropriate interpolation functions, followed by their implementation and 
testing in a FE code [14] [15] [16]. There are a number of finite-element frameworks to choose from, e.g., 
the Berkeley code FEAP by Taylor [17] and Netgen/NGSolve [18]. 


The time-consuming tedium of all of these traditional steps could be bypassed by our novel application 
of PINN directly to the highest-level momentum form.” 


Perplexed with the number of pathological problems encountered with linear dynamic problems when 
using DDE-T, as mentioned above, in parallel with developing novel methods to circumvent the problems 
in DDE-T, we also developed a script based on JAX, the High Performance Array Computing. 


For the axial motion of elastic bar, while our JAX script did not show the pathological problems of 
DDE-T (DDE with TensorFlow backend), and is faster than DDE-T for lower-level Form 1, it is slower 
than DDE-T in all PDE forms tested. Moreover, that DDE-T itself being more efficient in higher-level form 
than in lower-level form makes working directly with higher-level form even more attractive in addition to 
the advantages mentioned further above. 


Arriving at an appropriate learning-rate schedule for a category of problems is more art than science, 


> To filter the static solutions with near-zero acceleration encountered when using DDE-T, we introduce various forms 
of barrier functions, a novelty in PINN formulations, with room for improvements. A full-length report, with much 
more details than in the shortened version by half that is the present paper, can be found on arXiv.org [?]. 
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and has to be done by trial-and-error. Because of the large number of parameters in the training process, 
we systematically codify our experience running optimization (training process) through a normalization/- 
standardization of the network-training process and the result presentation so readers could understand and 
reproduce our results. The training speed, or computational efficiency, is most likely related to how the 
gradient is computed in different software (e.g., DDE-T vs JAX). 


Large computational efficiency is recently obtained with the proposed novel PINN formulations for the 
transverse motion of the Euler-Bernoulli beam and for the geometrically-exact Kirchhoff rod to be reported 
in a follow-up publication. 


2 PDEAs: Beam formulations with no shear 


As a particular case of the general partial-differential-algebraic equations (PDEAs) represented by Eq. (82), 
consider the geometrically-exact beam with no shear, i.e., the Kirchhoff rod, a derivation for which is first 
provided followed by pointing out the inconsistency of such formulation based on geometrically-exact beam 
with shear. Then approximations are introduced to recover the classical linear Euler-Bernoulli beam with 
axial deformation in preparation for an implementation in a PINN framework, beginning with DDE-T, then 
with JAX. 


do (X + dX, t) 


Figure 1: Geometrically-exact beam without shear deformation. Section 2.1. Shear forces 
introduced for equilibrium. Section 2.2: Inconsistency in Kirchhoff rod (large deformation) 
and Euler-Bernoulli (small deformation) beam theories. Figure 2, geometrically-exact beam 
with shear deformation. 


2.1 Geometrically-exact beam with no shear, Kirchhoff rod 


All equations are first derived in dimensional form, then subsequently converted to non-dimensional form. 
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2.1.1 Kinematics, finite extension, constitutive relations 


Figure | provides a pictorial definition of the finite extension e expressed as 
dé dé 


c= l= yy 1, with dL = dX , anddé = (1+e)dX , (2) 
with the following kinematics of deformation (x, y) = ¢o(X, t) such that 
g=X+u(X,t)>dr=dX+du=dlcoosa, y=v(X,t) > dy=dv=dlsina, (3) 


where {u(X,t), v(X,t)} are the axial and transversal displacements, respectively. The gradients of the 
spatial coordinates (x, y) with respect to the material coordinate X are then: 


dx dé cos a 
oe eb Uli g NY ae ={i ae 
1x +u TL (1+) cosa, qe Ts 
which give rise to the expression for the slope a of the centroidal line, together with the relations involving 
finite extension e and the displacements (wu, v): 
(1) 
v 


tana = ——., ecosa= 1+u% — cosa, esina =v — sina. (5) 
1+uQ) 


The last two expressions in Eq. (5)2,3 provide then the extension for e in terms of the displacements (u, v) 
and the slope a: 


a aa (4) 


e= (1 +u — cos a) + (vo — sin a) | ve ; (6) 


Using Eq. (4), some convenient variables involving the extension e and the displacement derivatives (u), 
v) are introduced: 


2 2 oO) ie hee a) ie oe 1 
c* = (1+e) = [144] + |v] so={[i+u] + |v] } =k. (7) 
The traditional elastic axial and bending constitutive relations are: 
N=EAe, M=ElIa"™). (8) 


2.1.2 Balance of momenta, non-dimensional form 


Referring to Figure 1, the equilibrium of forces and moment of the depicted infinitesimal segment leads to 
the following expressions. 


Balance of linear momentum along X axis: 


S° Fx =0=[(Ncosa)|x44x — (Ncosa)|y] — [(Ssina)|y 49x — (Ssina)| x] 


oe ferdx — fieax = d(N cosa) — d(Ssina) + fxdX — (pAdX)u. (9) 


Balance of linear momentum along Y axis: 


\_ Fy =0=[(Nsina)|x44x — (Nsina)| x] — [(Scosa)|y4 gx — (Scosa)| x] 


a fmax — fiadxX = d(N sina) — d(S cosa) + fydX — (pAdX)%. (10) 
Balance of angular momentum: 
> M|, =0=—M(X) + M(X +dX)4 S(X + dX)dl + Mapp — Miner ; (11) 


with Mapp = O(dé?),  Minent = p La dé 8%, (12) 
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Neglecting higher-order terms in dé and the area moment of intertia [4 of the cross section A as customarily 
done for thin beams without shear deformation, and taking the limit dX — 0, we obtain 


dM dX dM 
ae es — yu 
ea ee may a eee (13) 


Non-dimensionalization 


Coordinates and displacements, using Eq. (3): 


X=s, B=>=X+4, BaF, g=5=557, (14) 

[X] =] = =@1=)=1. (15) 

Normal force and shear force: 

N= Ny, with =] = ar a and [N|] = F, so [N] =1, (16) 

s= a, and [S] =1. (17) 
EI 


In Eq. (16), [N] = # means the axial force N has the dimension of force, denoted by F, and similarly for 
length L, mass J, and time J as used in Eq. (20) below. 


Distributed force using Eq. (16): 


x. — ip Ze 
f= ph with = |F Wl=gg=! (18) 
Moment: 
ee! L LF 
M = 57M, since [M] = la Mi= got (19) 
Time: 
A M/L M/L a 
T= L*\/(pA)/(ED); since B = oe = ee = gf thus [7] =7 , (20) 


t= -. and [f] =1. (21) 
T 


Balance of momenta 
Space derivative of a generic force F(X): 
_ dF(X) dX d (fr = ) LEIdF(X) _ Elza) 


re) = ale F(X))= a 
dX dX dX \ L? a LL? dx i 


(22) 
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Time derivatives of a generic displacement function x(t) with length dimension &: 


dx _ did{Lz@] _Laz@) _ Ls 


* S 23 

dt dt df 7 dé oT oe 
bdaeoi): Las . BE ws 

i <= T= Ds 24 
T dt (dt)? pe" pALs a 

Generic balance of linear momentum in non-dimensional form using Eq. (21), Eq. (22) and Eq. (24): 

FY 4 f= pAB> FU LF HE. (25) 

Space derivative of displacements and extension using Eq. (14): 

a) =: 4) = axe Bes) = au ) =0, and v) =o) , (26) 

Ox Ox OX Ox 
Similarly, using Eq. (26) in Eqs. (5)-(6) and Eq. (7), we have: 
a=@, e=8, ecosa=€écosa, esina=ésina, c=, R=fkh. (27) 


Using Eq. (25) in Eq. (9) and Eq. (10), together with Eq. (27), we have the balance of linear momenta along 
X and Y as: 


(N cos a) —(S sina)” +fx=Ut, (Nsin ay + (S'cos ay +fp=o. (28) 
Space derivative of moment from Eq. (19) and similar to Eq. (22), we have: 

EI— 
MY = we . (29) 


Balance of angular momentum. Eq. (13) in non-dimensional form based on Eq. (29), Eq. (17), and Eq. (27)5: 
MY +65 =03M+465=0, (30) 
which © can be expressed in terms of the space derivative of the displacement components (U, 0) as defined 


in Eq. (7), using Eq. (26). 


2.1.3 Constitutive relations, non-dimensional form 


Non-dimensional constitutive relation for axial deformation: 


EI — a, A 
N= BAe => 77 N = EAC > N= ——eé=4e, (31) 
with non-dimensional slenderness parameter 4 and its inverse being the rotundness ~ 
AI? : I 
Bs) = 37 : 32 
ae AL? — 
The space derivative of the slope a of the centroidal line is: 
da OX Oa 1 

CL) ox = — Lal) , 33 
= ax = ax ax L° a 
Non-dimensional constitutive relations for bending deformation using Eq. (8), Eq. (19), and Eq. (33): 
M = Elo) > M =a). (34) 


2.1.4 Equations of motion, non-dimensional form 


The first two terms in each of the non-dimensional balance of linear momentum in X and in Y direction, 
respectively, as expressed in Eq. (28) can be written in terms of displacements as follows. Using the axial 
constitutive relation Eq. (31) and the definition of a in Eq. (5) together with the expressions related to the 
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finite extension e defined in Eqs. (5)-(6), and again using Eq. (27), we have: 


N cos@ = J€cosa@ = 4 1 +u — cosa , Nsing = sésina = 3 [> — sina] : (35) 
er ee 36 
an a = en) ) (36) 


from which the first term as a function of (V,@) with a space derivative, in each equation in Eq. (28) is, 
respectively: 


(N cosa) =3 Ee +a sin a| : (N sina)? =4 ja) —~a cosal . (37) 


Next, using the balance of angular momentum, yielding the moment-shear relation with finite-extension 
effect, in Eq. (30), the bending constitutive relation in Eq. (34), and the definition of a in Eq. (5), and making 
use of Eq. (27)1, we obtain: 


Ssina = 2M sina = Ra) sin a, Scosa= Ra”) cosa . (38) 
oh Fa sna) = esl Eal)] q@) 

~ (S sin a) =-— (ha sin a) — a sin @ + [Ra a? cos @ (39) 
aay (1) go 8) 4 \ rene hae ba) a sina) 

+ (S cos a) = (ha cos z) =— [Ra cos @ + [Ra a’ sina? . (40) 


Using Eq. (37) for the derivative of the terms with (V,@), and Eqs. (39)-(40) for the derivative of the terms 
with ee @) in Eq. (28), the non-dimensionalized balance of linear momenta, we obtain the equations of 
motion in terms of the displacements (wu, v). 


Axial equation of motion: 


3 fn” +a sin a| + Ra] sin@ + Ra) a) cosa + Fx =%, (41) 


Transversal equation of motion: 


3 [3 — a cos a — Ra?) " cos @ + Ra?) a sina +fy =, (42) 


where the slenderness 4 was defined in Eq. (32), and R in Eq. (27), Eq. (26), and Eq. (7), reproduced here 
so all relevant terms in the above equations of motion are grouped together for convenience: 


ioe . (32) 
R= { 1 f a) i fo] a ; (43) 


2.1.5 Input parameters: Boundary and initial conditions, distributed load 

In addition to the slenderness 4 defined in Eq. (32) and the input distributed load functions f y(X,#) and 
fy (X, #) for Eqs. (41)-(42), respectively, i.e., 

fx(X,t) = fx(X,t), fy(X,d) = fy(X,t), (44) 


there are six input parameters in each of the two sets of boundary conditions considered here: One set for 
cantilever beam, and another for simply-supported beam. 


Cantilever beam 
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Clamped end: Three input parameters are the prescribed displacements and rotation {i v, at, such that 
AtX=0, UxX=0,A=%, d(0,f)=3, a(0,f)=a. (45) 


Free end: Three input parameters, two algebraic expressions with derivatives (AEDs). 

AtX=1, se(lQ=N, —-R(1,)MO(1,) =-R01,)aP(10=5, aap=M. (46) 
The three input parameters in Eq. (46) are the prescribed concentrated forces and moment {V, S, M}, and 
the two AEDs are Eqs. (46)1,2 due to the expression of € as a result of Eqs. {(5), (6), (26), (27)2}, and the 
expression of & in Eq. (43), with both € and & involving the first-order derivatives u and vo, These AEDs 
together with the differential equations from the balance of momenta form a system of differential-algebraic 
equations (DAEs). 


Simply-supported beam 
AtX =0, U(0,f)=%, (0,2) =%, a@(0,2) = Mo. (47) 


AtX=1, selQ=M1, w1o=%, a®on=™M. (48) 


Because of the expression for € in Eq. (48)1, the equations of motion of the simply-supported beam also 
form a system of DAEs. 


Remark 2.1. AEDs cannot be imposed exactly. Because “complex” boundary conditions are AEDs, such 
as Eq. (48)1, which involve derivatives of the dependent variables, they cannot be imposed exactly by hard 
constraints. Only “simple” boundary conditions not involving derivatives of dependent variables, such as 
Eqs.(48)2,3, can be imposed exactly. | 


There are four prescribed functions for the four initial conditions: 


2.2 Inconsistency, reduction from geometrically-exact beam with shear 


Deformation map of centroidal line [19], which is valid for both geometrically-exact beam without shear 
(Figure 1) and with shear (Figure 2, is written as: 


do(X, t) = [x ss u(X, t)| ila u(X, tee ’ (51) 


where {e1, €2} are the spatial basis vectors. The material basis vectors {t1, t2} attached to the cross section 
are then in terms of the rotation @ of the cross section 


t,(X,t) = cosO(X,t)-e1 + sinO(X,t)-e2, to(X,t) = —sin@(X, t)-e, + cosO(X,t)-e2. (52) 


t;| | cos@ sin@| fer|  ,rfer 
ae —sin@ ey {a} =—- ee 2) 


Spatial strains {1, y2} and material strain {[,, C2}; see Figure 2: 


v1) flt+u™—cos6| _ {(1+.e) cosa —cosd Ty ar l” (54) 
oJ v — sin 6 ~ | (1+e)sina—sind f ’ T2f of ° 
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“ 


Figure 2: Geometrically-exact beam with shear deformation. The deformed configuration 
(solid line) is superposed on the initial configuration (dotted line) of unit length (which could 
be multiplied by dX). Shear deformation is the difference between the angle a of the deformed 
centroidal line and the rotation 0 of the cross section. Spatial strains {71,72} and material 
strains {T,,T2}, such that y = ye, + y2e2 = Tit; + Tate. See [19] and Figure | for 
geometrically-exact beam with no shear. 


Next, using Eq. (4) in Eq. (54), we obtain 
v1 _ (1 + €) cosa — cos 6 . (55) 
2 (1 + e) sina — sin@ 


and in the case where 0 = a, we have 


Nl COs @ Vil: aw [el . je 
(nba etsmad? {raf at a op (59 


which indicates that there is zero shear deformation, and thus zero shear force S, since the material (axial 
and shear) forces {N1, No} = {.N, S} are computed from the linear constitutive relation [19] as follows: 


{ie} : "o A tn} 7 {oJ 7 (sf ' (57) 


But a non-zero shear force S was introduced in Figure | and in the balance of linear momenta to derive the 
equations of motion as done above. 


Remark 2.2. The authors of [19] cited (among other references) [20], where neither the dynamic problem, 
nor the computational formulation, was considered. It is also noted that the static “Form 4” in [20] was the 
result of a projection onto the cross-section basis vectors {t1, t2}, whereas the dynamic Form 4 (Section 3.6) 
considered here remains in the inertial frame. | 
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2.3 Small deformation, Euler-Bernoulli 


For small axial deformation e < 1, which includes inextensibility e = 0, the moment-shear relation in 
Eq. (13) is approximated as 
dM 

f=0=—] +8 58 
e< ax +5: (58) 
which is the classical relation between moment and shear. 
For small slope a and small axial strain uD), the slope of the centroidal line can be approximated as the 
slope of the transverse displacement using Eq. (5), i-e., 


1) 


a < 10° (small angle) and uw) <lsaxv=5,5a20Y =3,. (59) 


From Eq. (26); on the space-derivative of the transverse displacement, the slope of the deformed centroidal 
line can be written as 


Ovu(X, t) 
Ox 
and thus the space derivative of the slope is: 
a) _ ov OX a 1 iy 
= Ox OX OX i 

again using Eq. (26)s. 


Sy(X,t) = =v (X,t), 3,(X,f) := 2+ 


Ss 


(61) 


Non-dimensional constitutive relations for bending deformation using Eq. (8), Eq. (19), Eq. (34), Eq. (59), 
and Eq. (61): 


M = Els) =v) > M =3) = , (62) 


which is the moment-curvature relation for the Euler-Bernoulli beam. 


For the axial equation of motion in Eq. (41), the small-angle approximation in Eq. (59) together with the 
near-zero slope approximation: 


a0=> sina 0and cosaxl, (63) 
lead to the standard axial equation of motion with distributed load: 

SU?) + Fy BU. (64) 
Without applied axial distributed load, the axial Eq. (64) becomes the standard 1-D wave equation: 


50 => UW = ra ; (65) 
with the square-root of the slenderness, i.e., /4, as the non-dimensionalized wave speed. The more slender 
(i.e., the less rotund) the bar is, the faster the wave propagates along the axial direction. 


For the transverse equation of motion in Eq. (42), the approximations in Eq. (59) (small angle) and Eq. (63) 
(near-zero slope) lead to: 


3 ja) — a cos a| 20, Ra) a sina = 0 = [Ra] a + Pe =D ; (66) 


which, upon using the approximation & ~ 1 together with the small-angle approximation in Eq. (59), leads 
to the standard Euler-Bernoulli beam equation of motion with fourth space derivative on the transverse 
displacement v: 


oO 4 Fy 28. (67) 
Eq. (64) and Eq. (67) form the system of non-dimensionalized PDEs that describes the dynamics of small- 
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deformation Euler-Bernoulli beam. 


3 PINN formulations 


Several forms of the equations of motion of the Kirchhoff rod—starting from the lower level (Form 1) to 
the higher level (Form 4)—are provided below with significant consequences on the corresponding PINN 
implementation and computational efficiency. 


3.1 Inputs 


AL? 7 a ne n az 
Inputs: d= —— (32), Fx(X = fx(X,t), Fx = fx(%,t) A) 


3.2 Form 1: No operator splitting 


Form 1: No operator splitting 


eo) [a +a sin a + [Ra®)) ” sin @ + [Ra®)| a) cosa + = U , (41) 


3 jo) — a cos a| — Ra] a cos @ + Ra) a) sina + Wee =% : (42) 


ee io, Bad emOl oy 43 
oD Cs enaaee 1 ee [p+ a] + [p 


Dependent variables: 4 


{U, 0, a}, h. 


In principle, the number of dependent variables could be reduced from 4 to 3 by substituting the expression 
for & in Eq. (43) into the balance of linear momenta in Eqs. (41)-(42). Due to the space derivative on k, 
it would be simpler and computationally more efficient to use Form 1 above to avoid complex expressions 
and multiple computations of the derivatives 7 and 5), 

Similarly, while it is possible to further reduce the number of dependent variables from 3 to 2, because 
of the derivative a), it is simpler and more efficient to use Form 1 above, while paying attention to avoid 
multiple computation of sin @ and cos a. 


Remark 3.1. Form 1, pinned-pinned bar: Time shift and early stopping. A characteristic of Form | is the 
floating/shifting computed time history when there is convergence in the optimization process. Time shift 
examples are shown in Figures 20 and 21 for the axial motion of a pinned-pinned elastic bar; see Eq. (85) 
for the definition of the pinned boundary conditions. 

Another characteristic of Form 1 is “early stopping,” i.e., the lowest loss value occurs well before the 
end of the optimization process (see, e.g., [1]), which may diverge for large learning rates, and which could 
be stopped earlier. See Remark 5.18 and examples therein. | 


Remark 3.2. Form /, pinned-free bar: Static solution. For sufficiently large learning rate, a static solution 
could manifest in the training process; see Remark 5.19. a 


3.3 Form 2a: Split time derivatives 


Form 2a: Split only time derivatives to reduce to first order. 


(68) 


Sle 

I 
SI 
a 


3 [z® +a sin a| + Ra?) a sin @ + Ra] a) cosa + Fx = Dx , 
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ae (1) = = 0 . 
B) ja) — a cos a| — Ra) cos @ + Ra?) a) sina + fy =Py, UV=Dy (69) 
a) 2 2) —1/2 
a i 7) (1) 
tma=——ay GO, ® { [ita mn E } (43) 
Dependent variables: 6 
1, Vv; ic kh, {Px , Dy}. 


The number of dependent variables could be reduced from 6 to 4 by substituting the expressions for @ in 
Eq. (36) and for & in Eq. (43) into the balance of linear momenta in Eqs. (68)-(69) at the cost of repetitive 
evaluations and thus a loss of efficiency. 


Remark 3.3. Form 2a: Static solution. Under the right conditions, a static solution could occur in Form 2a 
for both the pinned-pinned bar and the pinned-free bar. Examples are given in Remark 5.20 with methods 
to avoid the static solution mentioned. Static solution could also be found in Form 1 for the pinned-free bar; 
see Remark 3.2, Remark 5.19. | 


3.4 Form 2b: Split space derivatives 


Form 2b: Split only space derivatives to reduce to first order. 


5s) +0 sina] — 8" sina — Sa" cosat fx =%, (70) 
2 ES — a cos a| 4 S cosa — Sa sina +fy =, (71) 
a =3,, a =M, (72) 
p=, MO =—c5, (73) 
a) 2 2 i/2 
me=——— €). FS {ft +a] + fo] } (7), 27) 
aka 


Dependent variables: 7 
{u, U, a}, ©, M, {Su, Sy} 


3.5 Form 3: Split time and space derivatives 


Form 3: Split both time and space derivatives to reduce to first order. 


3 ES +a sin a| oar go) sina — Sa) cosa + fx = Px ’ ey 
3 EY — al cos a =F ee cosa — Sa sina + eae We) 
DV =3, t=py, MO =-25, 7”) 


1) 2 2 
tana = ao CO), 2= {[1 +a) aL [>] } CHD) 


14 IJNME RLT 90th birthday v2.3.6 arXiv (submitted v1 on 2023.10.15) - 2024/07/16 


Dependent variables: 9 
{u, Vv; a}, C, M, {Dx Py}, {Su, Sy} 


The number of dependent variables can be reduced from 9 to 8 by substituting the expression for © only 
once in the expression relating moment to shear in Eq. (76)3, without repetitive computation. On the other 
hand, further reducing the number of dependent variable from 8 to 7 by substituting the expression for @ in 
Eq. (36) into the balance of linear momenta Eqs. (74)-(75) would be inefficient due to repetitive computation 
of @. 


3.6 Form 4: Balance of momenta with no substitution 


Form 4: Balance of momenta with first-order derivatives. 
1 1 . 1 1 o 
My Sy fy =ox, Ne Sy 2 fy =r (78) 
Nx =4 E fg) = cosa| , SS RM sin@ , (79) 
Ny =3 [a — sind , Sy= RM) cosa , (80) 
al) = Mi, Ul = Px, 0 = Dy , (81) 
al) 2 9) —1/2 
= = (1) (1) 
tana ay 38), B= {[r4+a0) + |p } (43) 
Dependent variables: 11 
{u,0, a}, kh, M, {Px, Py}; {Nx,Ny,Sx,Sy} 


The number of dependent variables can be reduced from 11 to 9 by substituting the expressions for @ in 
Eq. (36) and for & in Eq. (43) into the expressions for normal and shear force components (x, Ny) and 
(Sx, Sy) in Eq. (79)2 and Eq. (80)s, respectively. But the repetitive computations of the same quantities 
(e.g., @ in four different expressions, and & in two different expressions) could make such substitution 
inefficient. 


4 Generic PDEs, auxilliary conditions, loss function 


For short, boundary conditions and initial conditions are together referred to as auxilliary conditions. The 


system in all of the above forms & = 1,...,4—-PDEs and the associated auxilliary conditions—can be 
(k) 


a 


DB”) ({®uj, Or uj, ..., Puy}, (Alu, OF 1 u;,..., uj}, uj, t) = DW” (u(X,t),t) =0, 


generically written as a series of representative dynamic nonlinear differential operators D;”’ as follows: 


withk=1,...,4, i=1,...,n ie We), 


jv. Palen. go lea gaY (82) 
where u;(X,t) are the dependent variables, and 04-u,(X, t) is the pth partial derivative of u; with respect 
to X, and similarly for of Uj (X, t). When ni) = 0, there are no terms with time derivatives, as in the static 
case. For convenience, the following dynamic-operator array, which will be used later, is defined: 

OA lenin,” Yap Bee oe (83) 


wt ? a 
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The loss function for Form k to minimize is 
(k) 


J) = y wi” |o(| ao (84) 
i=1 
) 


where ws; is the weight associated with the differential operator gl* ni’) 


of an “error” function. 


One — Liccuey , playing the role 


5 Numerical examples 


At first, to set up PINN problems of structural dynamics, we opted to use the DeepXDE (abbreviated as 
“DDE”) framework [2], which was likely the most well-documented open-source software for PINN, with 
many examples to guide users and a forum for open discussions of problems users encountered. Specifically, 
since performance depends on the DDE version used, we provide this information. In the present paper, we 
use DDE v1.9.2 or v1.9.3 with the TensorFlow backend, which together is abbreviated as, e.g., DDE-T 
v1.9.3, or simply DDE-T, since DDE v1.9.3 is mostly used in this paper.’ 


In using DDE-T, we encountered many problems such as shift and amplification in the dynamic solu- 
tion, static-solution time history, in addition to the difficulty in designing appropriate learning-rate sched- 
ules, mostly by trial-and-error and accumulated experience that we want to convey to readers, to arrive at 
good solutions. Along the way, we devised various strategies to solve these pathological problems, such as 
different forms of the governing PDE system to solve the shift and amplification problem, and the barrier 
function to avoid static solutions. Even then, a significant amount of time was devoted to get these methods 
to work, compared to our script based on the JAX framework, the High-Performance Array Computing by 
GoogleMind, which did not manifest any of these pathological problems; see Remarks 5.10, 5.19. 


Since DDE-T is accessible to the public, the detailed documentation of the DDE-T pathological prob- 
lems, our proposed solutions and results, would not only be useful for the general readers, but also for the 
developers. 


Because of the large number of parameters, resulting in many different cases, and to allow for conve- 
nient, frequent back-referencing in figure captions and elsewhere in the text, the sections below are organized 
as a series of remarks, each addressing a specific issue/topic with immediate reference to the corresponding 
figures/results for illustration where applicable.’ 


Remark 5.1. Colab GPU time. Even though we did use a dedicated Linux machine with Nvidia GPU 
(Graphics Processing Units) for script development, all results presented here were obtained using Google 
Colab’s Nvidia Tesla K80 GPU with 12 GB RAM (free). See the GPU time in, e.g., Figure 15. a 


Remark 5.2. RunID. At the end of the caption of a figure, there is a RunID, such as “23723R1d-1,’ to 
indicate which run produced that figure: (two images): “23723” being the date 2023 Jul 23, “Rid” being 
“Run 1d,” the order of the DeepXDE script file executed on that date, and “-1” being the first figure (two 
images) coming from that run, from which the final result in Figure 21 with RunID 23723R1d-4 (Step 
200,000) is presented in the paper body, Section 5.3.1 to illustrate the time shift in Form 1 (Remark 5.17, 
Section 5.3.1). 


For example, Figure 12, RunID 23726R5c-1, was obtained with “deepxde-1.9.2 pyaml-23.7.0 scikit-optimize-0.9.0”, 
whereas Figure 22, RunID 2394R la-1, was obtained with “deepxde-1.9.3 pyaml-23.9.1 scikit-optimize-0.9.0”. In the 
follow-up paper, we will report a significant GPU time difference between DDE-T v1.10.0 and DDE-T v1.10.1. 

‘The remarks are important integral parts of the text. It is recommended, particularly for readers not familiar with 
PINN and its training process, not to skip, but to read through all remarks so to be familiar with the corresponding 
illustrative examples on the remark topics. The ultimate objective is to solve nonlinear PDEAs, after developing an 
experience with the training process using DDE-T and JAX to solve standard linear PDEs using the proposed novel 
PINN formulations through a normalization/standardization presented in this paper. 
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A reason for introducing the RunID is because, even though “stochastically reproducible,” the results 
are “deterministically irreproducible,” unless we run under deterministic mode, which is slower, as explained 
in Remark 5.15. For this reason, we kept a large majority of the executed scripts (run under non-deterministic 
mode), the results from a small number of which are presented here. | 


5.1 Network architecture, data-point grids 


Remark 5.3. Computational-domain, network variables, dynamic results. To shortened the description, 
the following parameter symbols are used. For bars and beams, the domain for the space coordinate x is 
the interval [0,1], the domain for time coordinate is [0,7], and the rectangular space-time computational 
domain is |0,1] x [0,7], with N being the number of data points on one of its sides. The data points, with 
a total number of NxN, are used to evaluate the loss functions. For the network architecture, n_inp is the 
number of inputs, W the width of a hidden layer, H the number of hidden layers, and n_out the number of 
outputs. 


Network inputs: For static problems, n_inp=1 (for the data-point x-coordinates), whereas for dynamic 
problems, n_inp=2 (for the data point (a, t) coordinates). Since the value of n_inp is clear from the context, 
it will not be listed in figure captions; see, e.g., Figure 34. 


Network outputs: The number of dependent variables of the PDE(s) depends on the Form used. For 
example, Form 4 of the Kirchhoff-Love rod (Section 3.6) has n_out=11 outputs; see, e.g., Figure 34 for 
Form 2a of a pinned-pinned bar, with n_out=2. 


For results related to dynamic problems, the phrase “time history” is omitted in figure captions for 
conciseness, since the meaning is clear from the context. For example, in Figure 34, the shape (left) and the 
midspan displacement (right) are time histories, with the shape having “x1” as the space x axis and “x2” the 
time t axis, which is also the horizontal axis of the midspan displacement. a 


Remark 5.4. Total parameters of dense networks. As an example, Form 1 of the axial motion of a pinned- 
pinned elastic bar (Section 5.3.1) and a network with n_inp=2 inputs (x,t), W=64 neurons per hidden 
layer, H=4 hidden layers, n_out=1 output u(a,t) lead to 12,737 parameters (weights and biases), Figure 3, 
obtained as follows. 


From the 2 inputs to the 64 neurons in hidden layer 1, there are 64*2 weights (connections) and 64 
biases, thus in total (64*2 + 64) = 192 parameters. From hidden layer 1 to hidden layer 2, there are 64*64 
weights (connections) and 64 biases, thus in total (64*64 + 64) = 1460 parameters. 


Similarly for the two subsequent pairs of hidden layers, the number of weights and biases is 1460 per 
pair, thus in total (64*64 + 64)*3 = 12480 parameters for all three pairs of hidden layers. From the hidden 
layer 4 to the one output layer, there are 64*1 weights (connections) and 1 bias, thus in total (64 + 1) = 65 
parameters. Summing up, there are 12,737 parameters (weights and biases) in total, Figure 3. 


In the model-summary output of the DeepXDE framework [2], Figure 3, the term “layer” is used to 
designate “layer of connections,’ which is formed by two “layers of neurons” as used here, and the type of 
connections mentioned above is termed “dense,” i.e., each neuron in the Ist layer is connected to all neurons 
in the 2nd layer. a 


Remark 5.5. Data-point grids. For the evaluation of the loss functions. We essentially used three types of 
grids: Regular grid, fixed-random grid, and varying-random grid. 

For a regular grid with N € {11,21,31, 41,51} as the number of points’ on, say one side of the 
space-time domain [0,1] x [0,7], containing the space-time point (x,t) with x € [0,1] and t © [0,7], 


> Meaning, the number N of points could take any value in the set {11, 21,31, 41,51}, with 51 as the default number 
in the normalization (or standardization) of the presentation of our results. To alleviate the notation, we also omit the 
overbar on (%, f) to write simply as (2, t). 
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Model: “fnn™ 
Layer (type) Output Shape Param # 
dense_1 (Dense) multiple 4160 
dense_2 (Dense) multiple 4160 
dense_3 (Dense) multiple 4168 
dense _4 (Dense) multiple 65 


Total params: 12,737 
Trainable params: 12,737 
Non-trainable params: @ 


Figure 3: DDE-T. Pinned-pinned elastic bar, axial motion. Model summary. Feedforward 
neural network (fnn). Remark 5.4. Section 5.3.1, Form 1. Dense-connection layers, each 
between two consecutive layers of neurons. Network: Remark 5.3, n_inp=2, W=64, H=4, 
n_out=1, 12737 parameters. Six neuron layers (1 input, 4 hidden, 1 output), five connection 
layers (pairs of consecutive neuron layers). (23721R1-1) 


where for example T € {2,4,8}, there is an even number of intervals on each side (space or time), i.e., 
{10, 20, 30, 40, 50}, respectively, with N x N as the total number of data points, such that one data point is 
at the center of the space-time domain. Figure 34 shows the shape (left) and midspan-displacement (right) 
time histories of the axial motion of pinned-pinned elastic bar using Form 2a (Section 5.3.2), a model with 
N51 W64 H4, having 12,802 parameters, a regular grid, and init_Ir=0.001. By default, N = 51 is used when 
the value of N is not mentioned in a figure caption. Figure 29 shows a static solution obtained using DDE-T 
with Form 2a, the same network and number of parameters and regular grid, but with an init_lr=0.01 (ten 
times larger). Figure 40 shows good shape and midspan displacement using Form 3 (Section 5.3.4), a model 
with N51 W32 H2, having 1,251 parameters, and a regular grid. 


For a random grid, N x N data points are randomly distributed in the space-time (x,t) domain, while 
ensuring that (1) there are N randomly distributed points along the time (t) axis for each of the two boundary 
locations x = 0 and x = 1, and (2) there are N randomly distributed points on the space (x) axis for the 
initial time ¢ = 0. A fixed random distribution is always generated with the seed 42 [21] so to obtain the 
same pseudo-random NV points for every execution. For a varying random grid, no seed is used. 


As an example of a random grid, see Figure 35, which corresponded to one of the lowest total loss 
0.537e-06 for Form 2a. Two reruns of the same exact script that produced Figure 35 yielded the total loss of 
0.626e-06 (RunID 2384R2b) and 0.518e-06 (RunID 2384R2c), both at Step 192,000, respectively. Figure 36 
shows that stopping earlier at Step 146,000 at which the total loss of 7.22e-06 was lowest in Cycle-4, with 
quasi-perfect midspan displacement and GPU time 442 sec, is a good trade-off. To give more information, 
the loss function was shown in Figure 36, since the random grid is similar to that in Figure 35. An even better 
trade-off is given in Figure 39, with a total loss of 1.179e-06, a quasi-perfect midspan displacement with 
damping%=0.1% and GPU time 278 sec. For an additional example of random grids, see also Figures 42-44. 


Switching from a regular grid to a random grid, while keeping all other parameters the same, could 
lower the total loss; see Remark 5.9 on “Extended learning-rate schedule” (ELRS) for an example. a 


5.2 Training (optimization) learning-rate scheduling. 


The Adam optimizer is used in all examples in the present paper, with learning-rate decay over Ngteps, 
the total number of steps, which is divided into mcycles, the number of decay cycles, each of which contains 
N-StEPSycje» the number of steps per cycle. This n-steps,,.j, may vary from cycle to cycle, and is subdivided 


iNtO Nperiods, the number of decay periods, each of which contains N-StEPS period? the number of steps per 
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period; see Figure 4. This n-steps,..ioq iS generally fixed when compiling a model with decay, with the 


N-StEPSpycje Chosen as a multiple of n-steps 


|. Cycle 1 +. Cycle 2 : Cycle 3 | 
\~< 


i 
Mee eed 


Period 
Figure 4: Optimization learning-rate scheduling. Section 5.2. Cycles and periods. 


period: 


After an initial-learning rate for a number of cycles was chosen, e.g., init_lr = 0.001, in any given 
period (p) within a cycle, the learning rate €(p.1) at the end of the previous period (p-1) is decayed to the 
value €(») = Auecay * €(p-1), With Zaecay being a decay factor less than one, following a decay function, such 
as the “inverse time” function. 


After each cycle and at the beginning of the subsequent cycle, the learning rate is reset to the initial- 
learning rate, similar to cyclic annealing, for which a detailed explanation can be found in [1]. 
As an example, we could set in a cycle n-steps, = 50, 000, n-steps 


20, and “decay = 0.9. 


The number of cycles neycies and the total number of steps Ngteps vary depending on the convergence 
properties of an execution. With n-steps.y.j. = 50,000 and n-steps,eriog = 2,500, we could set neycles = 4 
and Neteps = 200, 000 to “run all” these cycles one after another at once, or we could also set n-steps.yoie 
25, 000 then run only the first cycle to examine the solution obtained to decide whether to stop (in the case 
of reaching a static solution in a dynamic problem, as shown in Figure 48 right and Figure 49 left) or to 
continue the optimization process, possibly with different values for n-stepScycie, Neycless aNd Nssteps. 


a = 2,500, so that Nperiods = 


cycle perio 


Wy 
1.0 1.0 


Figure 5: Axial motion of elastic bar. Section 5.3. Mathematica solutions. Eq. (64) with 
slenderness 4 = 1 and distributed load fy = 1/2. Two vibration periods for each set of 
boundary conditions. Left: Pinned-pinned bar, with X € [0,1] and # € [0,4]. Right: Pinned- 
free bar, with X € [0,1] andt € [0,8]. 


Remark 5.6. Learning-rate schedule I (LRS 1). Cyclic annealing (CA) with same learning rate. To allow 
for a systematic comparison of the PINN results, LRS 1 is set to consist of neycles = 5 cycles, with Cycles 
1 and 2 having n-steps = 25,000 steps, totaling 50, 000 steps at the end of cycle 2, whereas Cycles 3, 
4, 5, each has n-steps.yoje = 50, 000 steps, making a total of Nsteps = 200, 000 steps over these 5 cycles. 
Computational results (loss function, deformed shape, GPU time) would be gathered at the end of these 


cycle 
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0.06 + 


0.05 + 


0.01 4 


0.0 05 10 1s 20 25 3.0 35 40 
Figure 6: DDE-T. Pinned-pinned bar, cyclic annealing (CA). Static-shape (left), static 
midspan displacement (right), Step 50,000. * Section 5.3.2, Form 2a. Remarks 3.3 (Static 
solution), 5.20 (How to avoid). Network: * Remarks 5.3, 5.5, T=4, W=32, H=2, n_out=2, 
He-uniform initializer, 1,218 parameters, regular grid. Training: x Remark 5.6, LRS 1, 
init_lr=0.07, n-cycles=2, N_steps=50,000. e Figure 7, NCA, same model and init_lr=0.07, waves, 
damping. > Figure 5, reference solution to compare. (23817R10a-1) 


0.10 4 


0.08 + 


0.06 + 


0.04 + 


0.02 4 


0.00 4 


0.0 05 10 15 20 25 3.0 35 4.0 
Figure 7; DDE-T. Pinned-pinned bar, No cyclic annealing (NCA). Shape (left), midspan dis- 
placement (right), Step 50,000, waves with damping. x Section 5.3.2, Form 2a. Remark 5.14, 
Damping. Network: x Remarks 5.3, 5.5, T=4, W=32, H=2, n_out=2, He-uniform initial- 
izer, 1,218 parameters, regular grid. Training: Remark 5.8, LRS 3, init_lr=0.07, n-cycles=2, 
N_steps=50,000. ¢ Figure 6, CA, same model and init_lr=0.07, static solution. Figures 10 (Form 
2a, 1,218 parameters), 40-41 (Form 3, 1,251 parameters), VCA, low damping. Figures 42-44, Form 3, 
1,251 parameters, random grid, guasi-perfect solution. (23817R10b-1) 
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— Train loss 

—— Test loss 0.12 4 

0.10 5 

0.08 5 

0.06 4 

0.04 5 

0.02 5 

0 25000 50000 75000 100000 125000 150000 175000 200000 oe 


# Steps 0.0 05 Lo 15 2:0 25 3.0 3. 5 4.0 

Figure 8: DDE-T. Pinned-pinned bar, cyclic annealing (CA). Loss function (left), midspan 
displacement, Step 200,000 (right), waves with damping. « Section 5.3.2, Form 2a. Network: 
* Remarks 5.3, 5.5, T=4, W=64, H=2, n_out=2, He-uniform initializer, 4,482 parameters, x 
regular grid. Training: x Remark 5.6, LRS 1 (CA), init_Ir=0.03. e Figure 9, 1,218 parameters, 
CA, init_lr=0.04, static solution. > Figure 5, reference solution to compare. (23815R3b-1) @ The train- 
loss history in blue coincides with the test-loss history in orange, and is not visible in the plot. The same 
situation occurs in subsequent loss-history figures. 


— Train loss 
10° 4 —— Test loss 


0.06 + 


0.05 4 


0.04 + 


0.03 + 


0.02 4 


0.01 4 


0 25000 50000 75000 100000 125000 150000 175000 200000 2.097 { | { { | f f | { 

# Steps 0.0 0.5 1.0 15 2.0 2.5 3.0 3.5 4.0 
Figure 9: DDE-T. Pinned-pinned bar, cyclic annealing (CA). Loss function (left), midspan 
displacement, Step 200,000 (right), static solution. * Section 5.3.2, Form 2a. Network: x 
Remarks 5.3, 5.5, T=4, W=32, H=2, n_out=2, He-uniform initializer, 1,218 parameters, x 
regular grid. Training: x Remark 5.6, LRS 1 (CA), init_lr=0.04. e Figure 8, 4,482 parame- 
ters, CA, init_lr=0.03, waves, damping. Figure 10, VCA, Form 2a, same model, waves, small damping. 
(23813R4b-1) 
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5 cycles. Each period is kept at a constant n-stepSpering = 2,500 steps, and has a learning-rate decay to 
faecay = 90% by inverse time.° 

Any deviation from the LRS 1 would be clearly indicated. For example, we may run a case using only 
two cycles, n-cycles=2, init_lr=0.07, and then stop the execution after N_steps=50,000, such as in Figure 6. 

Using DDE-T, while CA led to waves with damping as in Figure 8 (4,482 parameters, init_Ir=0.03, 
regular grid), but CA could also lead to a static solution as in Figure 6 and Figure 9 (1,218 parameters, 
init_Ir=0.04, regular grid). For the counterpart with “no cyclic annealing” (NCA), see Remark 5.8 on LRS 3. 

It is important to note that the static solution is one of the pathological problems we encountered with 
using DDE-T (Figure 27), but did not appear when using JAX (Figure 28); see Remarks 5.10, 5.19. 

A characteristic of CA is the jump in the loss function after each resetting of the initial learning rate at 
the beginning of each cycle; see the left subfigures of Figures 8-9. a 


—— Train loss 
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Figure 10: DDE-T. Pinned-pinned bar, Varying init_Ir cyclic annealing (VCA). Loss func- 
tion (left), midspan displacement (right), Step 200,000, waves, small damping. Section 5.3.2, 
Form 2a. « Remark 5.14, Damping. Network: Remarks 5.3, 5.5, T=4, W=32, H=2, n_out=2, 
He-uniform initializer, 1,218 parameters, regular grid. Training: x Remark 5.7, LRS 2 (VCA), 
init_Ir=[0.04, 0.03, 0.02, 0.01, 0.005]. @ Figure 9, CA, Form 2a, same model, init_Ir=0.04, static so- 
lution. > Figure 40, VCA, Form 3, same model and init_Ir sequence, waves, small damping. > Figure 5, 


reference solution to compare. (23821RIa-1) 


Remark 5.7. Learning-rate schedule 2 (LRS 2). Varying initial-learning-rate cyclic annealing (VCA). 
Each cycle has its own learning rate, which may (or may not) vary in the subsequent cycles. An example 
would be to use the same parameters as in LRS 1 (Remark 5.6), except for the different initial learning rate at 
the beginning of each cycle as prescribed by the list init_lr = [0.02, 0.02, 0.02, 0.01, 0.005], with decreasing 
initial learning rates, such that init_lr = 0.02 is used for Cycle 1, 2, 3, init_lr = 0.01 for Cycle 4, and init_Ir 
= 0.005 for Cycle 5. 


Figure 10, VCA with Form 2a, Figures 40-41, VCA with Form 3, the same model as in Figure 9 (N51 
W32 H2, 1,218 parameters, regular grid), and init_Ir sequence [0.04, 0.03, 0.02, 0.01, 0.005] led to waves 
with small damping. | 


°The learning rate at the end of a period is equal to 90% of the learning rate at the beginning of that period. 
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Figure 11: DDE-T. Pinned-pinned bar, NO cyclic annealing (NCA). Loss function (left), 
midspan displacement, Step 200,000 (right), waves with damping. x Section 5.3.2, Form 2a. 
Network: * Remarks 5.3, 5.5, T=4, W=32, H=2, n_out=2, He-uniform initializer, 1,218 pa- 
rameters, « regular grid. Training: x Remark 5.8, LRS 3 (NCA), init_lr=0.04. e Figure 9, CA, 
same model, static solution. Figure 32, NCA, model with 4,482 parameters, static solution. > Figure 5, 


reference solution to compare. (23813R4a-1) 


Remark 5.8. Learning-rate schedule 3 (LRS 3). No cyclic annealing (NCA), i.e., no resetting of the learn- 
ing rate at the beginning of each cycle, unlike LRS | in Remark 5.6. The number of cycles are just break- 
points to plot intermediate results. For example, a training with n-cycles=1 and N_steps=200,000 corre- 
spond to one breakpoint at Step 200,000, whereas a training with n-cycles=5 and N_steps=200,000 has five 
breakpoints. The initial learning rate init_lr set at the beginning of Cycle 1 decreases with inverse-time 
decay without being reset to init_Ir at the beginning of each subsequent cycle—unlike LRS 1 in Remark 5.6 
and LRS 2 in Remark 5.7—n-steps_period=2500, faecay = 0.9, and N_steps set to either 25,000, 50,000, 
100,000, or 200,000 steps for standardization. 


Figure 7, model with N51 W32 H2, having 1,218 parameters, shows waves with damping in the midspan 
displacement at Step 50,000 when using NCA with init_Ir=0.07. 


Figure 11, NCA with init_lr=0.04 and a model having 1,218 parameters led to waves with damping. 
Figure 32, NCA with init_Ir=0.03 and a model having 4,482 parameters led to static solution. To avoid 
the static solution, reduce the initial learning rate init_lr for the same model capacity, or reduce the model 
capacity (with a possible increase in initial learning rate as done in Figure 11); see Remark 5.13. a 


Remark 5.9. Extended learning-rate schedule (ELRS). In some examples, N_steps may be extended be- 
yond 200,000, such as 400,000. The extension could be in CA mode (Remarks 5.6-5.7) or in NCA mode 
(Remark 5.8). 


In NCA mode, the learning rate simply continues to decay from the end of Cycle 5 without being reset. 
Doing so is equivalent to extend n-steps_cycle for Cycle 5 from 50,000 steps to 250,000 steps. This Cycle 
5 extension is applicable to both LRS 1 in Remark 5.6 and LRS 2 in Remark 5.7. An example of Cycle-5 
extension for Form 2a and a low-capacity model N51 W32 H2 with 1,218 parameters is given in Figure 12, 
loss function and shape time history at Step 400,000; Figure 13, midspan-displacement time histories at Step 
200,000 and at Step 300,000; Figure 14, velocity and quasi-perfect midspan-displacement time histories at 
Step 400,000. The loss function in Figure 12, together with the midspan-displacement time histories in 
Figures 13-14 gives an idea of the gradual improvement of the solution during the optimization process. 
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Figure 12: DDE-T. Pinned-pinned bar, Extended learning-rate schedule (ELRS). Loss func- 
tion (left), shape, Step 400,000 (right). « Section 5.3.2, Form 2a. Network: « Remarks 5.3, 5.5, 
T=4, W=32, H=2, n_out=2, He-uniform initializer, 1,218 parameters, « regular grid. Train- 
ing: x Remarks 5.6 (LRS 1), 5.9 (ELRS), init_lr=0.02, Cycles 1-5 (CA), Cycles 6-9 (NCA), 
N_steps=400,000. « Lowest total loss 2.19e-06, Step 400,000 (sum of 6 losses). @ Figure 13, 
midspan displacements, Steps 200,000 & 300,000. Figure 14, velocity, very-good midspan displace- 
ment, Step 400,000. > Figure 5, reference solution to compare. (23726R5c-1) 
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Figure 13: DDE-T. Pinned-pinned bar, ELRS. Midspan displacements: Step 200,000 (left), 
Step 300,000 (right). Section 5.3.2, Form 2a. Network: x Remarks 5.3, 5.5, T=4, W=32, 
H=2, n_out=2, He-uniform initializer, 1,218 parameters, « regular grid. Training: x Re- 
marks 5.6 (LRS 1), 5.9 (ELRS), init_Ir=0.02, n-cycles=9, Cycles 1-5 (CA), Cycles 6-9 (NCA), 
N_steps=400,000. e Figure 12, loss function, shape, Step 400,000. Figure 14, velocity, very-good so- 
lution, Step 400,000. (23726R5c-2) 
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Figure 14: DDE-T. Pinned-pinned bar, ELRS. Velocity (left) and very-good midspan dis- 
placement (right), Step 400,000. « Section 5.3.2, Form 2a. Network: « Remarks 5.3, 5.5, T=4, 
W=32, H=2, n_out=2, He-uniform initializer, 1,218 parameters, regular grid. Training: x Re- 
marks 5.6, (LRS 1), 5.9 (ELRS), init_lr=0.02, n-cycles=9, Cycles 1-5 (CA), Cycles 6-9 (NCA), 
N_steps=400,000. e Figure 12, loss function, shape, Step 400,000. Figure 13, midspan displacements, 
Steps 200,000 & 300,000. Figure 35, Form 2a, 12,802 parameters, random grid, quasi-perfect solution. 
> Figure 5, reference solution to compare. (23726R5c-3) 


The smallest total loss at Step 400,000 was 2.19e-06, which was the sum of six losses: PDE, momentum, 
two boundary conditions, and two initial conditions. 


Switching from regular grid to random grid, maintaining all other parameters the same, could lower 
the total loss. For example, using Form 3 with the model N51 W32 H2 having 1,251 params, and init_Ir 
sequence [0.04,0.03,0.02,0.01,0.01,0.005] for Cycles 1-6 (VCA), and then Cycles 7-9 (NCA), using a reg- 
ular grid yielded a total loss of 2.56e-06, the sum of seven losses for Form 3, whereas using a random grid 
yielded a smaller total loss of 1.25e-06, both at the same last Step 400,000. |_| 


Remark 5.10. Piecewise-constant learning-rate schedule. (LRS 4) While the inverse-time decay is used 
in LRS 1 (Remark 5.6), LRS 2 (CA, Remark 5.7), LRS 3 (NCA, Remark 5.8), in our PINN script written 
using the JAX framework, due to time constraint,’ we only implemented the piecewise-constant schedule of 
the Optax gradient processing and optimization library for JAX. 


An example of LRS 4 would be: init_Ir=0.003, with factor_Ir=[0.9, 0.8, 0.7, 1, 1], so that the learning 
rate decays as follows: 0.003 in Cycle 1 (up to Step 25,000), (0.003 * 0.9) = 2.7e-03 in Cycle 2 (from Step 
25,000+1 to 50,000), (2.7e-03 * 0.8) = 2.16e-03 in Cycle 3 (from 50,000+1 to 100,000), (2.16e-03 * 0.7) 
= 1.512e-03 in Cycle 4 (from 100,000+1 to 150,000), 1.512e-03 in Cycle 5 (from 150,000+1 to 200,000), 
1.512e-03 in Cycle 6 (from 200,000+1 to 250,000). 


Hence LRS 4 is also a NCA schedule as LRS 3 (Remark 5.8), and can be extended beyond 200,000 
steps as in the above example. Figure 28 depicts the motion of a pinned-free bar under constant distributed 


’The main reason was the time constraint due to the approaching deadline for this special issue, as we only started 
developing our JAX script after spending a significant amount of time with DDE-T without satisfactory results, and 
had our JAX script in working order as the deadline was already looming close. There is no technical reason for not 
implementing the inverse time decay or any other time decay methods. 
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axial load (Figure 5) obtained with JAX and LRS 4. See also Remark 5.19, DDE-T vs JAX, which exhibits 
none of the DDE-T pathological problems. | 


Remark 5.11. Jnitializer: “He uniform” vs. “Glorot uniform.” The He-uniform initializer is equivalent to 
a larger learning rate, compared to the Glorot-uniform initializer. As a result, the He-uniform initializer may 
help the training converge faster, but could push the iterate to a static solution, depending on the PDE and 
its form. 


Consider Form 2a of a pinned-free bar, using a network with N51 W64 H4, 12,802 parameters, and 
the learning-rate scheduling LRS 3 (no cyclic annealing), and init_Ir=0.005. Figure 37 uses the He-uniform 
initializer, with regular grid, yielded a static solution. Switching from regular grid to random grid (Re- 
mark 5.20), keeping the same network architecture and number of data points, i.e., 12,802 parameters, 
Figure 38, also using the He-uniform initializer, yielded a quasi-perfect solution. 


From the pre-static solution of a pinned-free bar shown in SubFigs 45a-45b obtained with Form 3, 
He-uniform initializer, and initial learning rate init_lr=0.005, simply switching from He-uniform to Glorot- 
uniform initializer, which is equivalent to reducing the learning rate (even though init_Ir remained at 0.005), 
resulted in a very-good free-end displacement shown in SubFigs 45g-45h. | 


Remark 5.12. Rule of thumb for learning rate. A quick way to assess whether a learning rate would lead 


to the convergence to a solution (even though the converged solution may not be the one desired), is to run 
a script with zero applied force to see how the optimization converge to the zero solution. | 
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00 05 10 15 20 25 30 35 40 
Figure 15: DDE-T. Pinned-pinned bar, lower-capacity network I. Shape (left), midspan dis- 
placement (right), Step 200,000. « Remark 5.13, Optimal capacity. Remark 5.16, Unstable 
solution. Section 5.3.2, Form 2a. Network: x Remarks 5.3, 5.5, T=4, W=64, H=2, n_out=2, 
He-uniform initializer, 4,482 parameters, « regular grid. Training: Remark 5.6 (LRS 1), 
init_lr=0.01. *« Lowest loss value 2.12e-06, Step 200,000. Total GPU time 392 sec. e Fig- 
ure 34, Form 2a, 12,802 parameters, regular grid, small damping, and Figure 35, random grid, visually 
quasi-perfect solution. Figure 16, Form 2a, 1,218 parameters, regular grid, damping. Figures 42, 43, 44, 
Form 3, 1,251 parameters, visually quasi-perfect solution. > Figure 5, reference solution to compare. 
(23816R1a-1) 
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Figure 16: DDE-T. Pinned-pinned bar, lower-capacity network 2. Midspan displacement, 
damping, Step 200,000. « Remark 5.13, Optimal capacity. Section 5.3.2, Form 2a. Network: 
Remark 5.3, 5.5, T=4, W=32, H=2, n_out=2, He-uniform initializer, 1,218 parameters, regular 
grid. Training: init_lr=0.01. Left: Remark 5.6, LRS 1 (CA) (23816R2a-1); Right: Remark 5.8, 
LRS 3 (NCA) (23816R2b-1). @ Quasi-perfect solutions: Figure 35, Form 2a, 12,802 parameters, random 
grid. Figure 15, Form 2a, 4,482 parameters, regular grid; Remark 5.16, Unstable solution. Figures 42- 
44, Form 3, 1,251 parameters, random grid. Figure 14, Form 2a, 1,218 parameters, regular grid. 


Remark 5.13. Optimal network capacity. Figure 35 shows a quasi-perfect solution using the network 
N51 W64 H4, with 12,802 parameters: Lowest total loss 0.537e-06, Step 192,000; Total GPU time 640 
sec. A question would be could a network with smaller capacity (fewer parameters) achieve the same (or 
similar) results? Recall Remark 5.8, in which Figure 11, NCA with init_lr=0.04 and a model having 1,218 
parameters led to waves with damping. 


Figure 15 shows a much lower-capacity model with roughly one third the number of parameters, namely 
4,482, and yet also produced similarly excellent results: Lowest loss value 2.12e-06, Step 200,000; Total 
GPU time 392 sec. Could the model capacity be further reduced? 


Figure 16 shows a model with even lower capacity at 1,218 parameters, for which damping in the 
time histories was clearly significant, less damping with cyclic annealing (Remark 5.6, LRS 1), and more 
damping without cyclic annealing (Remark 5.8, LRS 3). But is it possible to reduce damping and lower 
the loss value with 1,218 parameters? Yes, indeed, see Figure 42-43: 1,251 parameters; Lowest total loss 
1.25e-06, Step 400,000; Total GPU time 598 sec; quasi-perfect solution. 


Of the above three models, the second model with 4,482 parameters is the best for the axial motion 
of a pinned-pinned bar. See Remark 5.14 on how to avoid damping in low-capacity models using varying 
initial-learning-rate cyclic-annealing (VCA) (Remark 5.7) and extension of learning-rate schedule (ELRS) 
(Remark 5.9). 


For the pinned-free bar, in Figure 28, the network with N51 W32 H2, with 1,185 parameters, has the 
optimal capacity, providing low total loss (lowest among these four cases), quasi-perfect damping% in the 
free-end displacement, and lowest GPU time. a 


Remark 5.14. Damping in low-capacity models. For a pinned-pinned bar using Form 2a and a model with 
N51 W32 H2, having 1,218 parameters, stopping at N_steps=200,000, the lowest damping was obtained 
around the initial learning rate init_lr=0.03, which yielded a ratio of 1st peak to 2nd peak around 0.12 /0.11 
= 1.09; Figure 17 (shape, midspan displacement, Step 200,000), Figure 18 (loss function, velocity, Step 
200,000). Several simulations using this same model and a wide range of initial learning rate, from 0.001 
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Figure 17: DDE-T. Pinned-pinned bar, damping in low-capacity networks. Shape (left), 
midspan displacement (right, damping), Step 200,000. *« Remark 5.14, Damping. Sec- 
tion 5.3.2, Form 2a. Network: Remarks 5.3, 5.5, T=4, W=32, H=2, n_out=2, He-uniform 
initializer, 1,218 parameters, random grid. Training: Remark 5.6, LRS 1, init_lr=0.03. e 
Figure 18, loss function, velocity, Step 200,000. Figure 19, midspan displacements, init_lr=0.001 and 
init_Ir=0.1. > Figure 5, reference solution to compare. (23817R3a-1) 
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Figure 18: DDE-T. Pinned-pinned bar, damping in low-capacity networks. Loss function 
(eft), velocity (right), Step 200,000. Remark 5.14, Damping. x Section 5.3.2, Form 2a. Net- 
work; Remarks 5.3, 5.5, T=4, W=32, H=2, n_out=2, He-uniform initializer, 1,218 parameters, 
random grid. Training: x Remark 5.6, LRS 1, init_lr=0.03. e Figure 17, shape, midspan displace- 
ment, Step 200,000. Figure 19, midspan displacements, init_lr=0.001 & init_lr=0.1. (23817R3a-2) 
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Figure 19: DDE-T. Pinned-pinned bar, damping in low-capacity networks. Left: Waves, 
damping, Step 200,000. Right: Static solution, Step 50,000. *« Remark 5.14, Damping. Sec- 
tion 5.3.2, Form 2a. Network: Remarks 5.3, 5.5, T=4, W=32, H=2, n_out=2, He-uniform 
initializer, 1,218 parameters, random grid (left), regular grid (right). Training: x Left: Re- 
mark 5.6, LRS 1, init_lr=0.001 (23817R2-1). Right: Remark 5.8, LRS 3, init_lr=0.1, n-cycles=2, 
N_steps=50,000 (23817R11-1). 


(Figure 19, left) to 0.1 (Figure 19, right), damping was inescapable in the results. Note that the damping 
with the smaller init_lr=0.001 was more pronounced than the damping with the 30 times larger init_Ir=0.03. 


So at first, damping appeared to be a characteristic of low-capacity models with N_steps below 200,000 
until we tried the varying initial-learning-rate cyclic annealing (VCA) mentioned in Remark 5.7 and Fig- 
ure 10 (Form 2a, 1,218 parameters), and Figures 40-41 (Form 3, 1,251 parameters), yielding much lower 
damping compared to that in Figure 17 and Figure 19, left. 


Damping can further reduced after VCA scheduling by extension of learning-rate scheduling (ELRS) 
to, say, Step 400,000. An example is given in Figures 42-44, Form 3, 1,251 parameters, random grid, quasi- 
perfect solution. See also Remark 5.16, Unstable solution for another example of VCA-ELRS leading to 
quasi-perfect solution. | 


Remark 5.15. Stochastic reproducibility. The results are only reproducible in the stochastic sense, i.e., the 
results could vary a little, even though qualitatively similar, when rerunning the same identical DeepXDE 
script, with exactly the same parameters. There are several sources of stochasticity: (1) The stochastic 
optimization initializer, i.e, either “Glorot uniform” or “He uniform” [1] (2) the density and probabilistic 
distribution of the data points in the (random) grid for the evaluation of the loss function. In other words, 
the results while stochastically reproducible are deterministically irreproducible. 


DDE-T allows for setting the random_seed to a number, such as “42” (see fixed random grid in Re- 
mark 5.5), in its configuration to produce deterministically reproducible results at the cost of slowing down 
the computation, and therefore this setting should be used only for debugging.’ As an example, using the 
same model with 4,482 parameters and 200,000 steps as in Figure 15 resulted in a 10% increase in com- 
putational time. See Remark 5.23 regarding the opposite for our JAX script, for which the deterministic 
mode is more efficient than the non-deterministic mode, an unexpected surprise.’ a 


’For DDE-T, see the function definition def set_random_seed (seed) in Source code for deepxde.config, and 
the command dde. config.set_random_seed (42) is used for determinism. 

For JAX, the commands os.environ["XLA_FLAGS"] = "--xla_gpu_deterministic_ops=true"and 
key = jax.random.PRNGKey (42) are used for determinism. 
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Remark 5.16. Unstable solution and method to fix. Sometimes, the solution could be unstable in the sense 
that it could not be hardly be recovered as if it occurred by chance such as that in Figure 15, which could 
never again be reproduced. Rerunning the exact same script again several times, from which Figure 15 was 
obtained, led to pre-static or static solutions. That’s stochastically irreproducible. 

When such situation occurs, gradually reducing the initial learning rate with cyclic annealing, using 
Remark 5.7 (VCA), and extending the learning-rate schedule, Remark 5.9 (ELRS), would usually help. For 
the case of Figure 15 with lowest total loss of 2.12e-06 at Step 200,000, using VCA for Cycles 1-5 with 
init_Ir = [0.01, 0.009, 0.008, 0.007, 0.006] and NCA for Cycles 6-9, and N_steps = 400,000 steps, yielded a 
quasi-perfect solution with lowest total loss 0.793e-06 at Step 400,000 with total GPU time 748 sec (RunID 
23831R2c). This VCA-ELRS solution is stochastically reproducible, i.e., stable in the sense that rerunning 
the script several times led to stochastically equivalent solutions. 

Hence with the above VCA-ELRS, a good trade-off would be at Step 200,000 with 4,482 parameters, 
lowest total loss 1.92e-06 at Step 200,000, total GPU time 401 sec (RunID 23831R2c), and a quasi-perfect 
solution not distinguishable from that in Figure 35, which was obtained with 12,802 parameters, lowest total 
loss 0.537e-06 at Step 192,000, total GPU time 640 sec. | 


5.3 Axial motion of elastic bar 
5.3.1 Form 1: Shift, amplification, early stopping, static solution 
The numerical evidence presented below shows that our JAX script solves many problems encountered 
with DDE-T, specifically: 
¢ For the pinned-pinned bar, DDE-T produced shift and amplification in the solution, whereas our JAX 
script shows no such problems. To solve these particular DDE-T problems (shift and amplification), 
we devised several methods (Form 2a, Form 3, Form 4) before writing our own JAX script. 
¢ For the pinned-free bar, DDE-T produced a static-solution time history, i.e., not dynamic solution, 
regardless of the size of the learning rate or the network capacity, whereas our JAX script shows no 
such problem. To solve this particular DDE-T problem (static solution), we devised several methods 
(barrier function, reduced network capacity, different learning-rate schedules) before writing our own 
JAX script. 
Form 1. The Form 1 of the axial equation of motion of an elastic bar is a particular case of the Form 1| for 
the Kirchhoff-Love rod in Section 3.2, and was given above in Eq. (64), reproduced below for convenience: 


su) + fy =t, (64) 
where 4 = 1 is the slenderness of the bar, defined in Eq. (32), fy = 7 with the following two types of 
prescribed boundary conditions: 

Pinned-pinned bar, using Eq. (47); for X = 0 and a similar one for X = 1: 

Qs =0),02)=aX =1,7)=0 (85) 


Pinned-free bar, using Eq. (47); and Eq. (48)1, and the extension € from Eq. (6) and Eq. (27): 


= L oa = = = i feos 
u(X =0,#)=u=0, @(X = 1,8) =a (1,2) = 5N =0 (86) 
The prescribed initial conditions, using Eq. (49), are: 
U(X, 0) = to(X) = 0, U(X,0) = %(X) =0. (87) 
The reference solutions for the pinned-pinned bar and for the pinned-free bar are given in Figure 5. 


Since both the strain 7) and the velocity w are not explicit outputs in Form 1, these boundary and initial 
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conditions cannot be explicitly enforced using hard constraints, but have to be imposed after computing the 
derivatives of the outputs using backpropagation and then included as additional squared errors in the overall 
loss function (soft constraints). 


Remark 5.17. DDE-T vs JAX, Form 1, pinned-pinned bar: Time shift, amplification, damping percent- 
age. DDE-T with Form 1 produced a shift and amplification in the solution and larger errors in boundary 
conditions with derivatives (Neumann type). JAX does not exhibit these pathological problems. 


For a pinned-pinned elastic bar subjected to a constant distributed axial load, Figure 20 shows two 
vibration periods of the deformed-shape time history and the midspan-displacement time history, shifted 
to the right, at training Step 100,000 obtained from a regular grid, whereas Figure 21 shows the same 
output variables at Step 200,000 obtained from a fixed random grid. Also for the same pinned-pinned bar, 
Figures 22-23 show the shift (time and vertical) and amplification parameters increase continuously with 
training step number, as the training progressed to the final Step 400,000. 


Using JAX, Figure 24 shows no shift since the times of the local maxima were exactly where they should 
be, and so were the times for the local minima. There was no amplification, except for a small damping. 
For results obtained with JAX, the quality of a response curve is measured by the damping percentage 
(damping% for short) defined by the ratio of the Ist local maximum over the 2nd local maximum minus 
one, with amplification% = 1 - damping% > 0. For example, the midspan displacement of a pinned-pinned 
bar in Figure 24 has a damping% of 1.7%, which is discernible. The quality of the response is classified by 
the damping% as follows: 


* | damping% |< 0.5% : Quasi-perfect, damping/amplification is not discernible by naked eyes. 
* | damping% |€ [0.5%, 1.5%) : Very good, small damping, could be discernible by naked eyes. 
¢ | damping% |e [1.5%, 3.0%) : Good, significant damping, discernible by naked eyes. 
* | damping% |> 3.0% : High damping, easily visible. 
a 


Remark 5.18. DDE-T vs JAX. Form I, pinned-pinned bar: Early stopping and divergence. After the 
lowest loss was reached earlier in the training process, and the loss function stopped to decrease after this 
lowest value, the training process can then be stopped earlier. Depending on the behavior of the loss function 
after the lowest loss, the training could diverge. See also “early stopping” in [1]. 


Using DDE-T, Figure 25 shows the loss function (left) with the lowest value of 2.30e-05 remaining at 
Step 24,000 as the training progressed to Step 100,000, with plateaus at loss value of 0.25, many orders of 
magnitude above the lowest loss, showing divergence. Once a plateau appears, there is no need to continue 
further, and stop the training process early, e.g., at Step 50,000 (or before). On the right subfigure is the 
shape time history at Step 24,000 (the lowest loss). 


Figure 26 ( DDE-T) shows a good midspan displacement with damping at Step 25,000, the last step at 
the end of Cycle 1 (left) and zero midspan displacement at Step 100,000 (divergence). 


Using JAX, a similar behavior of “early stopping” was also observed. a 


Remark 5.19. DDE-T vs JAX, Form I, pinned-free bar: Static solution. Our DDE-T script for Form 1 
produced a static-solution time history in every case. Our JAX script does not exhibit this pathological 
problem. For the pinned-free bar subjected to a constant distributed axial load, DDE-T produced a static 
solution regardless of the size of the learning rate and the network capacity (Figure 27), whereas JAX 
produced no static solution, but the expected waves with two vibration periods (Figure 28), as obtained with 
Mathematica in Figure 5. 
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0.08 


0.0 0.5 1.0 15 2.0 2.5 3.0 3.5 
Figure 20: DDE-T. Pinned-pinned bar. Shape (left), midspan displacement shifted to the 
right with small amplification (right), Step 100,000. « Section 5.3.1, Form 1. Network: Re- 
marks 5.3, 5.5, T=4, W=64, H=4, n_out=1, Glorot-uniform initializer, 12,737 parameters, regu- 
lar grids. Training: x Remark 5.8, LRS 3 (NCA), init_Ir=0.001, n-cycles=3, N_steps=100,000. 
* PDE loss 2.20e-06, Cycle 3 GPU time 154 sec. p> Figure 5, reference solution to compare. 
(23722R1-1) 


0.0 0.5 10 15 2.0 2.5 3.0 3.5 
Figure 21: DDE-T. Pinned-pinned bar. Shape (left), midspan displacement shifted to the 
right with small amplification (right), Step 200,000. « Section 5.3.1, Form 1. Network: Re- 
marks 5.3, 5.5, T=4, H=4, W=64, n_out=1, Glorot-uniform initializer, 12,737 parameters fixed 
random grid. Training: * Remark 5.6, LRS 1 (CA), init_Ir=0.001. x Lowest total loss 4.3 1e-06, 
Step 200,000 (sum of 5 losses). « Total GPU time 621 sec. e Figures 42-44, Form 3, VCA-ELRS, 
1,251 parameters, guasi-perfect solution (no shift), lowest total loss 1.25e-06, total GPU time 598 sec. 
> Figure 5, reference solution to compare. (23723R1d-4) 
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Figure 22: DDE-T. Pinned-pinned bar, continued shift & amplification. x Midspan displace- 
ments: Step 50,000 (left), Step 400,000 (right). « Section 5.3.1, Form 1: Hidden parameters 
(4, ¢, @), time shift 4, vertical shift ¢, amplification @, quasi-perfect peak-to-peak amplitude 
A. Network: Remarks 5.3, 5.5, T=4 W=64, H=4, n_out=1, He-uniform initializer, 12,737 pa- 
rameters, regular grid. Training: Remark 5.6, LRS 3 (NCA), init_Ir=0.005, N_steps=400,000. 


e Figure 23, shift-amplification parameters increase with training step number. (2394R1a-1) 


Form 1, pinned-pinned bar: Shift and amplification 
Time shift, Minimum (vertical shift), Peak-to-peak amplitude, Initial slope (velocity) 


10° 4 — Train loss 


@ Timeshift A Vertical shift © PtP Amplitude @ Initial velocity —— Test loss 


0.00 + + + + + + + t t t + + t + 1 
0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 


T T + T T T T T T 
0 50000 100000 150000 200000 250000 300000 350000 400000 
Step number * 0.001 # Steps 


Figure 23: DDE-T. Pinned-pinned bar. Form I. Appendix 1, Analysis. Scaled shift- 
amplification parameters (3, ¢c, @) and initial velocity increase with training step number (left), 
while the Total loss continues to decrease (right). All parameters p, such as time shift 4, vertical 
shift ¢, amplification factor @ and thus peak-to-peak amplitude «, and initial velocity, were 
scaled to fit in the chart. e Table 2, unscaled values of 3,.c,@ and computed initial velocity, scaling 
method. Figure 22, midspan displacements, Steps 50000 & 400000. (2394R1d-2) 
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Figure 24: JAX. Pinned-pinned bar, No shift & amplification, Remark 5.17. * Left: Loss 
function. Right: Step 198,000, lowest loss 1.724e-06, midspan displacement, times at local 
maxima (1., 3.), times at local minima (0., 2., 4.), damping%=1.7%, Quality: Good. x Sec- 
tion 5.3.1, Form 1. Network: Remarks 5.3, 5.5, T=4, W=64, H=4, Uniform initializer, 12,737 
parameters, random grid. Training: Remark 5.10, LRS 4, init_lr=0.002, factor_Ir=[0.9, 0.9, 
0.9, 0.8, 0.8], n-cycles=5, N_steps=200,000. Total GPU time 564 sec. e Figure 21, DDE-T, 
shift and amplification. Figure 22, DDE-T, shift-amplification parameters. Figure 23, DDE-T, shift- 
amplification parameters increase with training step number. > Figure 35, DDE-T, Form 2a, visually 
quasi-perfect midspan displacement. (23917R1d-1) 


—— Train loss 
—— Test loss 


10-4 


0 20000 40000 60000 80000 100000 +) = 0.8 fe) 
# Steps 1.0 

Figure 25: DDE-T. Pinned-pinned bar, early stopping. x Left: Loss function, lowest value 
2.30e-05 at Step 24,000, value 0.25 at Step 100,000, divergence. Right: Shape, Step 24,000. 
* Section 5.3.1, Form |. Network: Remarks 5.3, 5.5, T=4, W=64, H=4, n_out=1, He-uniform 
initializer, 12,737 parameters, regular grid. Training: Remark 5.6, LRS 1 (CA), init_Ir=0.03. 
e Figure 26, midspan displacement, Step 25,000, and zero midspan displacement, Step 100,000 (diver- 
gence). > Figure 5, reference solution to compare. (2386R2b-1) 
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Figure 26: DDE-T. Pinned-pinned bar. Midspan displacement, damping, Step 25,000 (left), 
zero midspan displacement, Step 100,000 (divergence, right). Section 5.3.1, Form 1. Network: 
Remarks 5.3, 5.5, T=4, W=64, H=4, n_out=1, He-uniform initializer, 12,737 parameters, regu- 
lar grid. Training: Remark 5.6, LRS 1 (CA), init_Ir=0.03. e Figure 25, loss function and shape, 
Step 24,000 with lowest loss. (2386R2b-2) 


Using DDE-T, several different sets of parameters were considered with the number of network param- 
eters varying from 12,737 down to 33, and with two initial learning rates, 0.001 and 0.0001. The results 
from four different network-parameter numbers (from 12,737 down to 105) and with the initial learning 
rate of 0.001 are given in Figure 27, in which all subfigures had these common parameters, which are not 
repeated in the figure caption: T=8, N_out=1, Glorot-uniform initializer, random grid. Idential results (i.e., 
static solutions) were also obtained with the smaller initial learning rate 0.0001. a 


5.3.2 Form 2a: No time shift, static solution, efficiency 


The PINN Form 2a for the axial equation of motion of an elastic bar is a particular case of the PINN Form 
2a for the Kirchhoff-Love rod in Section 3.3. 


su) 4+ fy =Dy, W=Ddx, (88) 
with the initial conditions: 
u(X,0) =U, Px(X,0) =%. (89) 


In the numerical examples, we use homogeneous initial conditions: 


Remark 5.20. Static solution, methods to avoid. Pinned-pinned bar. Using DDE-T Form 2a (time- 
derivative splitting) and a regular grid, Figure 29 show the loss function and a static-shape time history (or 
“static solution” for short, Remark 3.3), with the velocity and midspan-displacement time histories shown 
in Figure 30, all figures were from Step 50,000 of the same run. A sharp jump with a flat plateau along the 
time axis in the time history, such as in the right subfigure of Figure 30, is a telltale sign of a static solution. 


Because of the perfect alignment of the data points along the time axis, regular grids would lead to 
Static solutions easier than random grids. 

A pre-static time history is a solution at an early optimization step (e.g., 25,000) having the character- 
istic (a jump with a wavy plateau) that portend a static time history at subsequent optimization steps, and 
thus is a sign to stop the optimization process early (no need to continue further). Two pre-static midspan- 
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0 1 2 3 4 5 8 7 8 


(b) 12,737 params, Free-end disp (c) 1,185 params, Shape 25000 (d) 1,185 params, Free-end disp 


(e) 337 params, Shape 25000 (f) 337 params, Free-end disp (g) 105 params, Shape 25000 (h) 105 params, Free-end disp 


Figure 27: DDE-T, Form I, pinned-free bar, static solutions (Remark 5.19, Section 5.3.1). 
e SubFigs. 27a-27b: Shape, free-end displacement, Step 25000; GPU time 655 sec (Deter- 
ministic mode) for 200,000 steps. (2398R2a). Shape, free-end disp, step number for other pairs: 
(27c-27d) (2398R2b), (27e-27f) (2398R2c), (27g-27h) (2398R2d). e Network: Remarks 5.3, 5.5, T=8, 
W=64, H=2, n_out=1, Glorot-uniform initializer, random grid. Training: Remark 5.6 LRS 1, 
init_lr=0.001, N_steps=25000. e Figure 28, using our JAX script, no static solution. > Figure 38, 
Form 2a, GPU time 605 sec for 200,000 steps. Figure 45, Form 3, GPU time 537 sec for 200,000 steps. 
> Figure 5, reference solution to compare. > Appendix 2, Figure 50, barrier-function effects. 
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(d) 1,185 params, Free-end disp 


(e) 337 params, Shape 350000 (f) 337 params, Free-end disp (g) 105 params, Shape 599000 (h) 105 params, Free-end disp 


Figure 28: JAX. Form I, pinned-free bar. Remark 5.10 (LRS 4). e SubFigs 28a-28b: 12,737 
params, Step 49,000, Cycle 2 lowest loss 5.324e-06, damping%=0.25%, Quasi-perfect, GPU 
time 161 sec. Network: Remarks 5.3, 5.5, T=8, W=64, H=2, n_out=1, Uniform initializer, 
random grid. Training: Remark 5.10 LRS 4, init_lr=0.002, factor_Ir=[0.9, 0.9, 0.9, 0.8, 0.8, 
1, 1, 1, 1]. @3916R1a.4). « SubFigs 28c-28d: Cycle 3 lowest loss 4.381e-06, damping %=0.2%, 
Quasi-perfect, GPU time 107 sec (23916R1b.1). e SubFigs 28e-28f: Cycle 8 lowest loss 5.445e-06, 
damping%=0.0014%, Quasi-perfect, GPU time 335 sec (23916R1c.2.2). e SubFigs 28g-28h: Cycle 
13 lowest loss 2.749e-05, damping%=3.7%, High damping, GPU time 505 sec (23916R1d.2.3). @ 
Figure 27, using DDE-T, static solutions. > Figure 5, reference solution to compare. 
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displacement time histories at Step 25,000 are shown in Figure 31, both with init_Ir=0.03. The optimization 
for the left subfigure continues to Step 200,000 to exhibit a clear static solution shown in Figures 32-33, 
whereas the optimization for the right subfigure, where the jump at time ¢ = 0 is already sure sign of a static 
solution, stopped at Step 25,000. 


Several methods below (or a combination of these) could be used to avoid the static solution: (1) Use 
random grid (Figure 35) instead of regular grid (Figures 29-30), (2) Reduce the initial learning rate init_Ir 
(see below), (3) Reduce network capacity (number of network parameters) (see below), (4) Use barrier 
function (see Appendix 2). 


Using the same Form 2a and network with 12,802 parameters as in Figures 29, and reduce init_Ir from 
0.01 to various values below. Reducing the initial learning rate by 100 times to init_lr=0.0001 with NCA 
produced un-accentuated waves with high damping at Step 200,000. On the other hand, reducing by 10 
times to init_Ir=0.001 with CA, and extending Step 5 to 250000 with NCA (Remark 5.9), resulted in a good 
solution with small damping, Figure 34. See also Remark 5.16, Unstable solution. 


Reducing the network capacity would allow using a much larger learning rate for similar results. 
Figure 6 shows a static solution using a network with only 1,218 parameters (ten times smaller) with 
init_Ir=0.07 (seven times larger) compared to Figures 29-30 using a network with 12,802 parameters and 
init_lr=0.01. Figure 39 shows a solution obtained with DDE-T Form 3 using a network with 12,867 parame- 
ters, init_lr=0.01, NCA, resulting in the Cycle 3 lowest loss of 1.179e-06 at Step 88,000, and a quasi-perfect 
midspan displacement of a pinned-pinned bar. 


Pinned-free bar. The static solutions in DDE-T Form 1 in Figure 27 are totally avoided by using 
DDE-T Form 2a (Figure 38), and of course DDE-T Form 3 (SubFigs 451-451). a 
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Figure 29: DDE-T. Pinned-pinned bar, static solution. Loss function (left), static solution 
(right), Step 50,000, end of Cycle 2. « Remark 3.3, Static solution; Remark 5.20, How to avoid. 
Section 5.3.2, Form 2a. Network: Remarks 5.3, 5.5, T=4, W=64, H=4, n_out=2, He-uniform 
initializer, 12,802 parameters, « regular grid. Training: Remark 5.8, LRS 3, init_lr=0.01, n- 
cycles=2, N_steps=50,000. e Figure 30, velocity, midspan displacement. > Figure 34, regular grid, 
smaller init_lr=0.001. Figure 35, random grid, no static solution (same LRS 3 and init_Ir). (23725R1-1) 
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Figure 30: DDE-T. Pinned-pinned bar, static solution. Essentially-zero velocity (left); static 
midspan displacement (right), Step 50,000, end of Cycle 2. « Remark 3.3, Static solution; 
Remark 5.20, How to avoid. Section 5.3.2, Form 2a. Network: Remarks 5.3, 5.5, T=4, 
W=64, H=4, n_out=2, He-uniform initializer, 12,802 parameters, « regular grid. Training: 
Remark 5.8, LRS 3, init_lr=0.01, n-cycles=2, N_steps=50,000. e Figure 29, loss function, shape. 

> Figure 34, regular grid, smaller init_lr. Figure 35, random grid, no static solution (same LRS 3 and 
init_Ir). (23725R1-2) 
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Figure 31: DDE-T. Pinned-pinned bar. Pre-static midspan-displacement time histories at 
Step 25,000 (end of Cycle 1). * Remarks 3.3, 5.20. Section 5.3.2, Form 2a. * Network: 
Remark 5.3, 5.5, T=4, W=64, H=2, n_out=2, He-uniform initializer, 4,482 parameters, regu- 
lar grid. Training: Remark 5.8, LRS 3, init_Ir=0.03. * Left: n-cycles=5, N_steps=200,000; 
Figures 32-33, Step 200,000. (23815R3a-1). « Right: n-cycles=1, N_steps=25,000, jump at t=0. 
(2389R2-1). 
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Figure 32: DDE-T. Pinned-pinned bar, no cyclic annealing (NCA). Loss function (left) and 
Static-shape (right), Step 200,000. « Remarks 3.3, 5.20. Section 5.3.2, Form 2a. Network: 
Remarks 5.3, 5.5, T=4, W=64, H=2, n_out=2, He-uniform initializer, 4,482 parameters, regular 
grid. x Training: Remark 5.8, LRS 3, init_Ir=0.03. e Figure 31, left, midspan displacement, Step 
25,000. Figure 33, velocity, midspan displacement, Step 200,000. (23815R3a-2) 


0.015 


1.0 00 OS 10 15 20 25 30 35 40 
Figure 33: DDE-T. Pinned-pinned bar, no cyclic annealing (NCA). x Essentially-zero Veloc- 
ity (left), static midspan displacement (right), Step 200,000. « Remarks 3.3, 5.20. Section 5.3.2, 
Form 2a. Network: Remarks 5.3, 5.5, T=4, W=64, H=2, n_out=2, He-uniform initializer, 4,482 
parameters, regular grid. Training: Remark 5.8, LRS 3, init_Ir=0.03. e Figure 31, left, midspan 
displacement, Step 25,000. Figure 32, loss function, static shape, Step 200,000. (23815R3a-3) 
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0.0 05 10 15 20 25 3.0 35 4.0 
Figure 34: DDE-T. Pinned-pinned bar. Shape (left), midspan displacement (right), small 
damping, Step 250,000. x* Section 5.3.2, Form 2a. Network: Remarks 5.3, 5.5, T=4, 
W=64, H=4, n_out=2, He-uniform initializer, 12,802 parameters, regular grid. Training: 

* Remarks 5.6 (LRS 1), 5.9 (ELRS), init_Ir=0.001, Cycles 1-5 (CA), Cycle 6 (NCA), 
N_steps=250,000. « Lowest total loss 2.55e-06, Step 250,000 (sum of 6 losses). @ Figures 29-30, 
init_Ir=0.01, static solution. Figure 15, 4,802 parameters, good solution. > Figure 5, reference solution 

to compare. (23822R3b-1) 


5.3.3 Form 2b: Same issues as Form | 

Form 2b for the axial equation of motion of an elastic bar is a particular case of Form 2b for the 
Kirchhoff-Love rod in Section 3.4: 
04 Fy =t, tas, (91) 
with the initial conditions 
u(X,0)=%, U(X,0)=%p, 3y(X,0) =3q0. (92) 
In the numerical examples, we use homogeneous initial conditions: 


u(X,0)=0, w(X,0)=0, 3y(X,0)=0. (93) 


Form 2b (splitting the space derivative operator) does not resolve the shift and amplification like Form 
2a (splitting the time derivative operator). For example, using the same parameters as in Figure 20 for the 
pinned-pinned bar, shift and amplification, albeit with a different magnitude, still persisted. 


Form 2b also yielded the same static solutions as those in Figure 27. 


5.3.4 Form 3 (or 4): No space/time shift, static solution, efficiency 


Form 3 (or 4) for the axial equation of motion of an elastic bar is a particular case of Form 3 (or 4) for the 
Kirchhoff-Love rod in Section 3.5 (or 3.6): 


sa + fx =dxy, DI=5,, U=Py, (94) 
with the initial conditions 


u(X,0)=%, Su(X,0)=3u0, Dx(X,0) =t%. (95) 
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Figure 35: DDE-T. Pinned-pinned bar. Step 200,000: Shape (left), midspan displace- 
ment (right), visually quasi-perfect. *« Remark 5.13, Optimal capacity. Remark 5.16, Un- 
stable solution. Remark 5.20, Avoiding static solution. Section 5.3.2, Form 2a. WNet- 
work: * Remarks 5.3, 5.5, T=4, W=64, H=4, n_out=2, He-uniform initializer, 12,802 pa- 
rameters, x random grid. Training: « Remark 5.8 LRS 3 (NCA), init_lr=0.01. n-cycles=1, 
N_steps=200,000. « Lowest total loss 0.537e-06, Step 192,000 (sum of 6 losses). Total GPU 
time 640 sec. e Figure 36, early stopping at Step 146,000, good trade-off. > Figures 29-30, regular 
grid, static solution (same LRS 3 and init_Ir). Figure 15, 4,802 parameters, good solution. Figure 14, 
1,218 parameters, very-good solution. (2384R2a-1) 
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Figure 36: DDE-T. Pinned-pinned bar. Left: Loss function. Right: Step 146,000, midspan 
displacement (right), damping%=0.3%, Quasi-perfect. « Section 5.3.2, Form 2a. Network: 
* Remarks 5.3, 5.5, T=4, W=64, H=4, n_out=2, He-uniform initializer, 12,802 parameters, x 
random grid. Training: x Remark 5.8 LRS 3 (NCA), init_Ir=0.01. « Total loss 0.722e-06, Step 
146,000 (sum of 6 losses). Total GPU time 442 sec. e Figure 35, running through to Step 200,000. 
> Figures 29-30, regular grid, static solution (same LRS 3 and init_Ir). Figure 15, 4,802 parameters, 
good solution. Figure 14, 1,218 parameters, very-good solution. > Figure 5, reference solution to 
compare. (2384R2d-1) 
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Figure 37: DDE-T. Pinned-free bar. He-uniform initializer. x Static shape (left), free-end 
displacement (right), Step 25,000, end of Cycle 1. Section 5.3.2, Form 2a. Network: Re- 
marks 5.3, 5.5, T=8, W=64, H=4, n_out=2, He-uniform initializer, 12,802 parameters, regular 
grid. Training: Remark 5.8, LRS 3, init_lr=0.005. e Figure 38, same model, LRS 3, init_lr=0.005, 
random grid, quasi-perfect solution. (2395R3a-1) 


i) 1 2 3 4 5 6 7 
Figure 38: DDE-T. Pinned-free bar. He-uniform initializer. « Step 100,000: Shape (left), 
free-end displacement (right), damping%=0.14%, Quasi-perfect. Section 5.3.2, Form 2a. Net- 
work: Remarks 5.3, 5.5, T=8, W=64, H=4, n_out=2, He-uniform initializer, 12,802 parameters, 
random grid. Training: Remark 5.8, LRS 3, init_lr=0.005, Cycle 3 lowest loss 2.472e-06, GPU 
time 318 sec. (2395R3b-1) @ (Cycle 5 lowest loss 1.268e-06, GPU time 605 sec (Deterministic mode). 
(2395R3b.3)) Figure 37, same model, LRS 3, init_lr=0.005, regular grid, static solution. > Figure 27 Form 
1, static solutions. Figure 45, Form 3, static and quasi-perfect solutions. > Figure 5, reference solution 
to compare. 
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In the numerical examples, we use homogeneous initial conditions: 


u(X,0)=0, Su(X,0) =0, Px(X,0) =0. (96) 


(a) Loss function (b) Shape t.h. (c) Velocity t.h. (d) Slope t.h. 


(e) Midspan disp t-h. (f) Right-end disp t.h. (g) Right-end slope t.h. 


Figure 39: DDE-T. Pinned-pinned bar. Step 88,000: “t.h.’ = time history. SubFig. 39e, 
midspan displacement t.h., damping %=0.1%, Quasi-perfect. x Section 5.3.2, Form 3. Network: 
* Remarks 5.3, 5.5, T=4, W=64, H=4, n_out=3, He-uniform initializer, 12,867 parameters, « 
random grid. Training: x Remark 5.8 LRS 3 (NCA), init_Ir=0.01. * Total loss 1.179e-06, 
(sum of 7 losses). Total GPU time 278 sec. e The above is a good trade-off with the results in 
Figures 35-36. > Figure 5, reference solution to compare. (2384R2e-1) 


Figures 40-41 for the axial motion of a pinned-pinned bar were obtained using Form 3, VCA (Re- 
mark 5.7), network with 1,251 parameters, regular grid, for 200,000 steps, and resulted in the lowest total 
loss of 4.31e-06 and GPU time of 309 sec. 


For the same pinned-pinned bar, Form 3, network with 1,251 paramters, but VCA with ELRS (Re- 
mark 5.9) and random grid for 400,000 steps, Figures 42, 43, 44, yielded a solution with a total loss of 
1.25e-06 (3 times smaller compared to 4.31e-06 in Figures 40-41) and total GPU time of 598 sec (less than 
2 times longer than 309 sec in Figures 40-41). See Remark 5.5 regarding the lowest total loss achieved, 
0.518e-06, for axial motion of a pinned-pinned bar. 


While symmetry can be observed in the quasi-perfect midspan-displacement time history in Figure 42, a 
lack of symmetry was revealed in the computed slope (space derivative) time history of the right pinned-end 
in Figure 44, which showed a shift downward by 0.03 (since the slope must be zero at time t = 0) anda 
clear departure from symmetry near time t = 4. 

Compared to Form 1 used in Figure 21, with a network of 12,737 parameters, CA (Remark 5.6) over 


200,000 steps, with a total GPU time of 621 sec, Form 3 used in Figure 42 with 1,251 parameters and a total 
GPU time of 598 sec is still 4% more efficient, despite running over twice the number of steps to 400,000. 


Remark 5.21. Switching from Form 2a to Form 3, DDE-T. To obtain results using Form 3 in DDE-T, 
the starting point was to use the same parameters (with init_lr=0.005, He-uniform initializer) used for Form 
2a in DDE-T that resulted in Figure 38, with only Form 2a changed to Form 3; the result was a pre-static 
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0.0 05 10 1s 20 25 3.0 35 40 
Figure 40: DDE-T. Pinned-pinned bar, VCA. Shape (left), midspan displacement, Step 
200,000 (right), waves, small damping. x Section 5.3.4, Form 3. Remark 5.14, Damping. 
Network: Remarks 5.3, 5.5, T=4, W=32, H=2, n_out=2, He-uniform initializer, 1,251 param- 
eters, x regular grid. Training: x Remark 5.7, LRS 2 (VCA), init_Ir=[0.04, 0.03, 0.02, 0.01, 
0.005]. « Lowest total loss 4.31e-06, Step 200,000 (sum of 7 losses). * Total GPU time 309 
sec. @ Figure 41, velocity, slope. > Figure 9, CA, Form 2a, same model and init_Ir=0.04, static solution. 
Figure 10, VCA, Form 2a, same model and init_lr sequence, waves, small damping. (23821RIc-1) 


0 


1.0 
Figure 41: DDE-T. Pinned-pinned bar, VCA. Velocity (left), slope (right), Step 200,000, 
waves, small damping. « Section 5.3.4, Form 3. Network: « Remarks 5.3, 5.5, T=4, W=32, 
H=2, n_out=2, He-uniform initializer, 1,251 parameters, « regular grid. Training: x Re- 
mark 5.7, LRS 2 (VCA), init_Ir=[0.04, 0.03, 0.02, 0.01, 0.005]. e Figure 40, VCA, Form 3, 
same model and init_Ir sequence, waves with small damping. > Figure 9, CA, Form 2a, same model, 
init_Ir=0.04, static solution. Figure 10, VCA, Form 2a, waves with small damping. (23821R1c-2) 
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— Train loss 
—— Test loss 


T T T T T T T T T 
© 50000 100000 150000 200000 250000 300000 350000 400000 +) 0.8 0 
# Steps 1.0 


Figure 42: DDE-T. Pinned-pinned bar. Loss function (left), shape, Step 400,000 (right). 
Remark 5.14, Damping. Section 5.3.4, Form 3. Network: Remarks 5.3, 5.5, T=4, W=32, H=2, 
n_out=3, He-uniform initializer, 1,251 parameters, random grid. Training: * Remarks 5.7 
(LRS 2), 5.9 (ELRS), init_lr=[0.04, 0.03, 0.02, 0.01, 0.01, 0.005] Cycles 1-6 (VCA); Cycles 
7-9 (NCA). x Lowest total loss 1.25e-06, Step 400,000 (sum of 7 losses). « Total GPU time 
598 sec. e Figure 43, velocity, midspan displacement, Step 400,000. Figure 44, slope, Step 400,000. > 
Figure 35, total loss 0.537e-06. > Figure 5, reference solution to compare. (23824R1-1) 


1.0 0.0 O05 10 150-200 258.0885 
Figure 43: DDE-T. Pinned-pinned bar. x Velocity (left), visually quasi-perfect midspan 
displacement, Step 400,000 (right). x Section 5.3.4, Form 3. Network: Remarks 5.3, 5.5, 
T=4, W=32, H=2, n_out=3, He-uniform initializer, 1,251 parameters, random grid. Training: 
Remark 5.7 (LRS 2), Remark 5.9 (ELRS), init_Ir=[0.04, 0.03, 0.02, 0.01, 0.01, 0.005] Cycles 
1-6 (VCA); Cycles 7-9 (NCA). e Figure 42, loss function, shape, Step 400,000, lowest total loss, 
GPU time. Figure 44, slope, Step 400,000. (23824R1-2) 
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45 


46 IJNME RLT 90th birthday v2.3.6 arXiv (submitted v1 on 2023.10.15) - 2024/07/16 


-0.1 4 


-0.2 4 


-0.3 4 


-0.44 


1.0 00 © 05 10 15 20 25 30 35. 4.0 
Figure 44; DDE-T. Pinned-pinned bar. x Slope (left), right-pinned-end slope, Step 400,000 
(right). *« Section 5.3.4, Form 3. Network: Remark 5.3, 5.5, T=4, W=32, H=2, n_out=3, 
He-uniform initializer, 1,251 parameters, random grid. Training: x Remark 5.7 (LRS 2), 5.9 
(ELRS), init_lr=[0.04, 0.03, 0.02, 0.01, 0.01, 0.005] Cycles 1-6 (VCA); Cycles 7-9 (NCA). e 
Figure 42, loss function, shape, Step 400,000, lowest total loss, GPU time. Figure 43, velocity, midspan 
displacement, Step 400,000. (23824R1-3) 


solution shown in SubFigs 45a-45b. Indeed, increasing init_Ir to 0.01 resulted in a static solution shown in 
SubFigs 45c-45d, and reducing init_lr to 0.001 was not enough to avoid a pre-static solution, SubFigs 45e- 
45f. Returning init_lr to 0.005 and switching from He-uniform to Glorot-uniform initializer resulted in 
a very-good free-end displacement with a Cycle 5 lowest loss of 1.851e-06 at Step 187,000, as shown in 
SubFigs 45¢-45h. Decreasing init_Ir to 0.001, maintaining the Glorot-uniform initialization, then resulted in 
a quasi-perfect free-end displacement, albeit with a higher Cycle 5 lowest loss of 5.877e-06 at Step 200,000, 
as shown in SubFigs 45-45}. 


It is important to note that the above results do not absolutely imply the superiority of the Glorot 
initializer over the He initializer, but as noted before in Remark 5.11, the Glorot initializer is equivalent to a 
smaller learning rate compared to the He initializer. a 


Remark 5.22. Efficiency: Form I vs Form 2a vs Form 3. DDE-T vs JAX. First we used our DDE-T script. 
Based on our numerical experiments on the axial motion of a bar, Form 3 is computationally more efficient 
than Form 2a, which is itself computationally more efficient than Form 1. 


For the pinned-pinned bar, going from Form 3 (531 sec) to Form 2a (561 sec) requires 6% more 
computational time. Going from Form 2a to Form | (624 sec) requires 11% more computational time. 
Going from Form 3 to Form | requires 18% more computational time.'® For the pinned-pinned bar, Form 1 
using the Hessian to compute the 2nd derivative has an average GPU time of 624 sec (RunIDs 2388RS5a,b,c). 
Switching from a Hessian to two Jacobians increased the average GPU time to 693 sec, or 11%. With the 
Form 3 GPU time remaining the same at 531 sec, going from Form 3 to Form 1 using two Jacobians requires 


‘Ror each form, three runs were carried out to obtain the average GPU time. The additional GPU time required for 
moving from one form to another was based on these average GPU times. For the pinned-pinned bar, see RunIDs 
2388R2a,b,c for Form 3 (531 sec), RunIDs 2388R6a,b,c for Form 2a (561 sec), and RunIDs 23101R3a,b,c for Form 
1 with Hessian (624 sec). 
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(a) Init_lr=0.005, He, pre-static. (b) Free-end slope, 197000. 


(d) Free-end disp, 50,000. 


(f) Free-end slope, 200,000. 


0000 125000 150000 175000 200000 
# Steps oO 1 2 3 4 5 6 7 8 


(i) Init_Ir=0.001, loss, Glorot. (j) Free-end disp, Quasi-perfect. (k) Velocity time history. (l) Slope t.h. 200,000. 


Figure 45: DDE-T. Pinned-free bar, Form 3 (Section 5.3.4). Remark 5.21, Switch Form 2a 
to Form 3. e SubFigs 45a-45b, Pre-static solution. Step 197,000: (45a) Free-end displace- 
ment with velocity discontinuity at t=1 and upward shift. (45b) Free-end slope jump with 
large amplitude oscillations (negative space derivative) starting before t=1. « Section 5.3.4, 
Form 3. Network: Remark 5.3, 5.5, T=8, W=64, H=4, n_out=3, He-uniform initializer, 12,867 
parameters, random grid. Training: x Remark 5.8 (LRS 3, NCA), init_lr=0.005, GPU time 
537 sec. (2395R3d-1) e SubFigs 45c-45d, Static shape, free-end disp, Step 50,000, init_Ir=0.01, 
He-uniform. (2395R3d.2) e SubFigs 45e-45f, pre-static free-end disp, slope (space derivative), 
Step 200,000, init_Ir=0.001, He-uniform. (2395R3d.5) e SubFigs 45g-45h, lowest loss 1.85 1e-06, 
Step 187,000, shape, Very good free-end disp, Glorot-uniform initializer (Remark 5.11), 
init_Ir=0.005, damping%=-0.7%. (2395R3d.3) x SubFigs 451-451, Step 200,000: (451) Lowest 
loss 5.877e-06, init_lr=0.001, Glorot-uniform, damping%=-0.13%, (45j) Quasi-perfect free- 
end disp, (45k) Velocity time history (t-h.), (45k) Slope (space derivative) t-h., GPU 533 sec 
(Deterministic mode). (2395R3d.4) @ Figure 38, Form 2a, same model parameters as SubFigs 45a-45b, 
Quasi-perfect free-end disp. > Figure 5, reference solution to compare. 
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—— Train loss 


oO 20000 40000 60000 80000 100000 
# Steps 


(a) JAX Form 1, loss 100,000. (c) Free-end disp, Very good. 


(d) JAX Form 3, velocity 145,000. (e) Slope time history. (f) Free-end disp, Quasi-perfect. (g) Free-end slope. 


Figure 46: JAX. Pinned-free bar. SubFigs 46a-46c, Form 1: (46a) Step 100,000, Total loss 
1.749e-06 (sum of 5 losses), Average loss 0.350e-06. Network: Remark 5.3, 5.5, T=8, W=64, 
H=4, n_out=1, Uniform initializer, 12,737 parameters, random grid. Training: * Remark 5.8 
(LRS 4), init_Ir=0.002, factor_Ir = [0.2, 0.2, 0.2, 0.2, 0.2, 1, 1, 1, 1]. (46b) Shape time history 
(t.h.). Damping%=-0.94%, (46c) Very good free-end displacement. (23916R1a.8) e Form 3: (46d) 
Step 145,000, velocity t.h., Total loss 4.063e-06, Average loss 0.580e-06. Network: Same as 
the above, except n_out=3, 12,867 parameters. Training: Same as the above. (46e) Slope 
(space derivative) t.-h. Damping%=-0.3%, (46f) Quasi-perfect free-end disp t.h.. (46g) Free- 
end slope (space derivative) t.h., peak-to-peak amplitude 8.658e-03. (23916R1a.7.2) 
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31% more computational time. 


For the pinned-free bar, the results (obtained in Deterministic mode) in Figure 27 (Form 1, 655 sec), 
Figure 38 (Form 2a, 605 sec), and Figure 45 (Form 3, 533 sec), indicate that going from From 3 to Form 
2a requires 14% more computational time, going from Form 2a to Form 1 requires 8% more computational 
time, going from Form 3 to Form 1 requires 23% more computational time. The DDE-T results in Non- 
deterministic mode are tabulated in Table 1, which is explained below. These percentages would change with 
the number of derivatives on a PDE system, with the axial motion having the lowest number of derivatives 
considered here, with two space derivatives and two time derivatives. 


After obtaining the above results for the pinned-free bar with our DDE-T script, we ran the same cases 
with our JAX script to compare Form 1 to Form 3. Using JAX Form 3, the total loss of, 5.135e-06 (average 
loss 0.734e-06) at Step 97000 after 357 sec (at Step 100,000, which is before Step 145,000 the results at 
which are shown in SubFigs 46d-46g) is a good trade-off for the loss 5.877e-06 at Step 200,000 (end of 
Cycle 5) after 533 sec using DDE-T Form 3 in SubFigs 45i-451. In SubFigs 46d-46g, it took 529 sec to 
train 150,000 steps of JAX Form 3 to get a loss of 4.104e-06 and Cycle 4 lowest loss of 4.063e-06 at Step 
145000. 


All results depend crucially on the learning-rate schedule and the initializer used, meaning in principle 
one could conceivably design a learning-rate schedule with an appropriate initializer for DDE-T Form 3 
to reach a total loss below 4.104e-06 in 150000 steps or less. Hence in the end, the GPU time for a fixed 
number of steps is important and more meaningful for comparing between two different implementations. 


The GPU times for DDE-T and for JAX for 200,000 training steps in each form in Non-deterministic 
mode are tabulated in Table 1, which shows that JAX Form | is faster than JAX Form 2a, which is faster 
than JAX Form 3, the opposite of DDE-T.'' The results in Table 1 show that our DDE-T script is faster than 
our JAX script. Using DDE-T GPU time as reference, DDE-T Form 1 (529 s) is more efficient than JAX 
Form | (575 s) by 9%, whereas DDE-T Form 2a (498 s) is more efficient than JAX Form 2a (644 s) by 29%, 
and DDE-T Form 3 (488 sec) is more efficient than JAX Form 3 (733 sec) by 50%. a 


Table 1: Pinned-free bar. Remark 5.22, Efficiency, DDE-T vs JAX. GPU time for 200,000 
training steps. All ran in Non-deterministic mode. Each number represents the average GPU 
time of three runs (see Footnote 11). Network: Remark 5.3, 5.5, T=8, W=64, H=4. 


Form | | Form 2a | Form 3 


DDE-T | 529 sec | 498 sec | 488 sec 
JAX | 575 sec | 644sec | 733 sec 


Remark 5.23. JAX Deterministic slightly faster than Non-deterministic. Unlike DDE-T for which running 
in deterministic mode carries an average penalty of about 4% in GPU time compared to non-deterministic 
mode (or 10% for a single case as noted in Remark 5.15), it is surprising to note that JAX in deterministic 
mode is more efficient than in non-deterministic mode. Re-running each JAX non-deterministic case in 
Table | three times in deterministic mode, the corresponding average GPU times are: 550 sec (JAX Form 1, 


"Table 1. For DDE-T Form 1, the average GPU time of three runs was 529 sec: RunIDs 2398R2a.3 (538 s), 
2398R2a.4 (523 s), 2398R2a.5 (525 s). DDE-T Form 2a, average GPU time 498 sec: 2398R2a.6 (503 s), 2398R2a.6 
(504 s), 2398R2a.6 (487 s). DDE-T Form 3, average GPU time 488 sec: 2398R2a.9 (499 s), 2398R2a.10 (477 
S), 2398R2a.11 (488 s). e For JAX Form 1, average GPU time 575 sec: 23916R1a8 (588 s), 23.9.16 Rla.8.3 
(554 s), 23.9.16 Rla.8.4 (583 s). JAX Form 2a, average GPU time 644 sec: 231024R2c (618 s), 231025R2b 
(640 s), 231025R2c (675 s). JAX Form 3, average GPU time 733 sec: 23916R1a7 (701 s), 23916R1a7.2 (751 
s), 23916R1a7.3 (747 s). 
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or 4% less than non-deterministic), 628 sec (JAX Form 2a, or 2.5% less), 689 sec (JAX Form 3, or 6% less). 
a 


6 Closure 


Following up our review of deep learning applied to computational mechanics [1], we developed novel 
PINN formulations for nonlinear dynamics governed by partial-differential-algebraic equations (PDAEs), 
exemplified by the equations of motion of a geometrically-exact Kirchhoff rod in higher-level forms, initially 
developed to address the pathological problems of shift and amplification encountered with DDE-T, which 
is likely the most well-documented among PINN frameworks. 


For the static-solution problem, we consider the use of barrier functions, which themselves represent 
a novel application in PINN, to prevent the training iterate to get close to a static solution. Even though 
the barrier function is an elegant method, it still has room for improvement, while simpler methods such 
as reducing the model capacity (using lower number of parameters), use a different initializer, and smaller 
learning rate led to satisfactory results. 


Another novelty is to apply PINN directly to the highest-level momentum form (balance of momenta) 
as it would save much time and effort to hand derive from the highest-level from to the lowest-level form 
through to computational formulation and implementation as done in traditional approach. 


In parallel with using DDE-T, we developed a script based on JAX, which does not exhibit the patho- 
logical problems of DDE-T, it is slower than DDE-T in all forms tested. 


To help readers reproduce our results, we normalize/standardize the training process, which is important 
to compare different training strategies due to the large number of parameters involved. Each figure is 
accompanied with a caption that details all parameters used to obtain the results. The computed gradients in 
both DDE-T and JAXlikely hold the key for the behavior (pathological problems and computational speed) 
observed. 


Large computational efficiency with the proposed PINN formulations recently obtained for the trans- 
verse motion of the Euler-Bernoulli beam and for the geometrically-exact Kirchhoff rod will be reported in 
a follow-up paper. 
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Appendices 


1 Analysis of time shift and amplification 


First, we analyze the computed solution of Form 1 of the axial motion of a pinned-pinned elastic bar 


with distributed axial force (Section 5.3.1), with ¥ = X: 


PDE: u?) +f, =, (64) 
BCs: u(0,t) =u(1,t)=0, (85) 
ICs: u(z,0)=0, w(z,0)=0. (87) 


To alleviate the notation, we omit the overbar designating non-dimensionalization on (x,t), while keeping 
the overbar in w@ and f,,. Let the computed solution w(x, t) be related to the exact solution (2, t) by (see 
Figures 22-23): 

1 


u(x,t) = eu(z,t—3)+ce> U(z,t) = = [a(a,t+4)—-e], (97) 


where «@ is the amplification parameter, 4 the time shift parameter, and ¢ the vertical shift parameter. These 
are the three hidden parameters, in addition to the network parameters, for the training to adjust to reduce the 
loss. Figure 23 shows that as the shift and amplification (scaled from their true values in Table 2) increase, 
the total loss continues to decrease during training. 


Table 2: Pinned-pinned bar. Form 1. Section 1, Analysis. Unscaled shift-amplification param- 
eters (5, ¢, @), with A being constant, initial velocity, left-end displacement versus checkpoint 
training step number. e Figure 23, scaled shift-amplification parameters and initial velocity increase 


with training, using Pscalea = (Pp — posc00) * ky > 0, where Pgcatea is the positive scaled parameter 
p, with p2s090 the value of p at Step 25000, and k, a selected scale factor for p: k; = 10, k, = 100, 
kes) = 10, Ktnit vel = 10. 


Step | Time shift 3 | Vertical shift c | Amplitude 2 | Initial velocity | Left-end disp 
25000 0 0.00973727 0.08445985 0.01150370 0.00973727 
50000 0.107 -0.00462352 0.13343337 -0.06753206 | -0.00094142 

100000 0.146 -0.00861757 0.14141834 -0.09649991 | -0.00139512 
150000 0.168 -0.01143161 0.14696970 -0.11479854 | -0.00152922 
200000 0.184 -0.01377921 0.15177655 -0.12892485 | -0.00155308 
250000 0.196 -0.01580278 0.15583168 -0.14010072 | -0.00157554 
300000 0.204 -0.01743147 0.15909590 -0.14862418 | -0.00158554 
350000 0.212 -0.01873142 0.16170362 -0.15547872 | -0.00158999 
400000 0.216 -0.01975145 0.16374922 -0.16018747 | -0.00159215 


The peak-to-peak amplitude of @ (if known), or of a computed quasi-perfect solution, used as a reference 
is denoted by A, so that 2 is the peak-to-peak amplitude of the computed solution % (Figure 22). 


The space and time derivatives of u(x, t) are 


1 
Ugg Gh) = Ble Gt — 3) S Ue G0) = F use(Z,t +3) , 


1 1 
ti(a, t) — 7 ula. t +9) A tt (a, t) = 7 telat +3) ; 


(98) 


(99) 
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which when substituted in Eq. (64) yield 
linn (x,t +4) + af,(z,t) = tye(2,t +d) > ter (z, t) + af,(z,t —) = te(a, t) , (100) 


with boundary conditions at x» = 0 and at x, = 1: 


TO ee = act Da ae Se: (101) 
and initial conditions: 

ate0p=t= = ton asec; (102) 
u(x, 0) =0= = [iu(2,3) Gi led e30 (103) 


Due to the computed displacement being enforced by the Dirichlet boundary condition, i.e., 
a(xz,0) =e x 0= eu(z,-5) +ce> c= —aU(x,—-5) + €, (104) 


where ¢ is a small number or the order of 10~°, the “essentially zero” for this type of network training, as 
shown in Table 2. So if 4 > O (right time shift), since U(z,—3) > 0 and @ > 0, it follows that c < 0 if 
«@u(x,—3) > €, which is a condition met in the example in Table 2. 


The Neumann boundary condition was used in DeepXDE for the initial velocity, but did not enforce the 
time derivative of the computed solution to the “essentially zero” ¢€. Instead, the initial velocity was allowed 
to be non-zero: 


te(x, 0) a aux, =A) #0, (105) 
and thus if 3 > 0, since u(x, —3) < 0, it follows that i;(2,0) < 0, which is observed in Figure 22. 


Second, we show that Form 2a of the pinned-pinned elastic bar (Section 5.3.2) 


PDE-1: @?) +f, =Dp ; (88) 
PDE-2: %=D,, (88) 
BCs: @u(0,t) =U(1,t) =0, (85) 
ics apo, pele. 0)=0. (90) 


have neither shift nor amplification. The analysis for Form 3 is the same, and is not repeated. The explicit 
Dirichlet boundary condition imposed an “essentially zero” € on the initial velocity of the computed solution: 


t(x,0) = € = a%;(x,0 -—3) > & (xz, -3) = e/a 0, (106) 


with 2 # 0 and in the vicinity of zero. But the initial velocity of the exact solution is 7;(a,0) = 0. Hence, 
u(x, —3) — €/a = U(x, 0) = 0, and thus s = 0 (“essentially zero”). At this point @ > 0 can be anything. 
The explicit Dirichlet boundary condition also imposes an “essentially zero” € on the initial displacement of 
the computed solution: 

u(x,0) =€= au(z,0-3) + e¢= au(r,0)+e>c=c€20, (107) 


which is also an “essentially zero” (see the left-end displacement in Table 2). Now substituting u = u/@ 
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into the above Form 2a of the PDE system to obtain: 


PDE-1: tigen /@ + fp = Dy > tier + Of, = @Dy , (108) 
PDE-2: t%/@ = Dy > tt = @D, , (109) 
BCs: «w(0,t) = u(1,t) =0, (85) 
Cs: de) =0, p20) =0. (90) 


Thus w would be the solution of the above PDE system only if we had amplified the applied force f,, by the 
amplification factor @. In other words, if we did not amplify the applied force, i.e., @ = 1, then & = 4%, or 
in practice, % & U. 


Figure 47: Inverse barrier function vs log barrier function. Appendix 2. Shifted barrier, 
buffer-zone depth 7. 


2 Filtering static solutions, barrier functions 


A Static solution is a solution of the governing equations of motion, such as Eqs. (41)-(42), with zero inertia 


force on the right hand side, abstractly written based on the dynamic nonlinear differential operator gi 
defined in Eq. (82) as: 


By 9 = SE OKs, Dye tuys Beuy}) = Sf xX) = 0, 


q 10 


bet GH in”: (110) 


7 
where se is a static nonlinear differential operator, which is a static part of the balance of linear momenta 
in PINN Form k, with nl) = 0 in Eq. (82), and Ug = (Tet, Vs) the exact static solutions for @ = (U,V). 
For convenience, the following static-operator array is defined based on the dynamic-operator array defined 
in Eq. (83): 


EGP aia hs b= 0,204,3% (111) 


To prevent the minimization iterate during the minimization process from reaching the static solution Uy, 
the loss function J can be augmented by a barrier (or penalty) function @(w) that would drastically increase 
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the loss, thus creating a barrier (penalty), when the minimization iterate comes close to these solutions. 


For a generic constrained nonlinear optimization problem of the form: 


min f(x) such that0O <a, (112) 
the nonlinear objective function f(a) can be augmented by a barrier (penalty) function 4(x): 
JO = f(x) + w G(x) , with (113) 
. if 
E(x) = 6™ (2) = #O (2) = , or (114) 
r—a 
1 
G(x) = 69° (x) = 6% (x) = log € = z) = —log(z — @). (115) 


where 24") — @ is the inverse barrier function, @"°2) = 2 the logarithmic (log) barrier function, and ¢# 
the shift of the origin to create a buffer zone that prevents the minimization iterate from getting too close to 
the feasible-domain boundary, as shown in Figure 47. Shown in the right subfigure are the cases for 7 = 0.1 
(for log barrier only) and @ = 0.5. While the buffer-zone depth of the inverse function gradually decreases 
toward the boundary, the buffer-zone depth for the shifted log function remains seemingly constant, with 
much slower decrease rate. 


The shorter abbreviations defined as “i = inv = inverse” and “I = log = logarithm” are to be used in 
an expanded supercripts (mn) that determine the character of the barrier function 2™, with m € {i, 1} 
(already defined above) and n € {s, p, a} (to be defined further below in the context of structural dynamics). 


Let UW, be a static solution satisfying the static PDE, i.e., the left-hand side of Eq. (82): 
Si (tg X,t)) =O, fore = 12 (110) 


JO (@) = J(G) + wba). (116) 
Barrier functions based on a static solution, with superscripts “i = inverse,” “l = log,” and “s = statics”: 


o)(q) = — 


Barrier functions based on satisfying the static nonlinear operator, with additional superscripts “p = pde,” 
and “k& = Form k”’ (other than the above): 


i=n® me 
ee 1 _ . k)\2 
BONG) = Sa ag AMG) = bog (ISM I-42), SPI= | DCS) 
i=1 
(118) 


Barrier functions based on the system acceleration (or time derivative of linear momenta), with an additional 
superscript “a = acceleration” (other than the above): 


ia) (>> 1 
= Tela 


, Bz) = —log (BI —@) . PII (61)? + 7)”. 


(119) 
Remark 2.1. Acceleration barrier function. For PDEs without an analytical static solution, the static- 
solution barrier function in Eq. (117) is clearly not convenient to use. For PDEs with complex, nonlinear 
static operators, the barrier function in Eq. (118) aiming at satisfying such a static operator is computation- 
ally inefficient. On the contrary, since the system acceleration is readily available, the acceleration barrier 
function in Eq. (119) is easily implemented, and computationally efficient. | 
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; 


Barrier-function domain 


Barrier-function domain 


Figure 48: Barrier-function domain. Section 2. The barrier function domain (left), a subset 
of the computational domain bounded by X ¢€ [0,1] and t € [0,tmax] could be the rectangle 
ABCD, or just the segment EF, with both avoiding the ramp zone that occurs for certain distri- 
bution of nodes in the computational domain and the network architecture, such as a random 
distribution of nodes (in a static solution) with a neural network having 4 hidden layers, and 32 
neurons per layer for a pinned-free bar (right). 


Remark 2.2. History of logarithmic and inverse barrier functions. In structural dynamics, as soon as we 
observed through our numerical experiments that the PINN minimization process could lead to a static 
solution, we immediately thought about additing an inverse penalty function to the PINN loss function to 
prevent the minimization iterate from reaching the static solution. By taking the logarigthm of the inverse 
barrier function, we obtain the log barrier function. This was done before we checked the optimization 
literature, such as [22] p. 417, from where we learned of [23] and [24]. It was surprising that, according 
to the classic book by Fiacco and McCormick [23] p. 7, the years of appearance of these barrier functions 
were reversed, with the logarithmic barrier function introduced by Frisch in 1954 [25] and 1955 [24], and 
the inverse barrier function introduced by Carroll a few years later in 1959 [26] and 1961 [27]. | 


As an example of how the barrier function works, consider the static solution in SubFigs 27c-27d, 
obtained with DDE-T Form 1, using a network with N51 W32 H2 and 1,185 parameters. In Figure 50, 
SubFigs 50c-50d, an acceleration barrier (Appendix 2, Remark 2.1) was used, and effectively helped prevent 
the static solution, while achieving a “Very good” free-end displacement at Step 25,000, with a damping 
percentage of -0.6%, which measures the ratio between two successvive local maxima (peaks) of free-end 
displacement at {0.378, 0.379}, respectively, below the more accurate peak amplitude of about 0.5 shown in 
Figure 28. At Step 50,000, this ratio yielded a damping percentage of -0.3%, and was thus even classified as 
quasi-perfect. Even though the shape of the response was far from being recognized as good, the vibration 
period ressembled the accurate vibration period in Figure 28 (or in Figure 5), with the times for the local 
maxima being {2.4, 5.6}, instead of {2., 6.}. 


As for the selection of the barrier depth and weight, see Figure 51 for the influence of the barrier weight 
w on the solution, keeping the barrier depth @ = 1. It is also possible to vary the barrier depth to illustrate 
its influence on the solution. 

Upon removal of the barrier after Step 50,000, the static solution sprung back quickly, as shown by 
a precipitous drop in the loss function, the shape, and the characteristic free-end displacement with a flat 
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Barrier-function domain 
Barrier-function domain 


Sharp.jump 
No ramp 


1.0 


Figure 49: Expected effects of barrier function. Section 2. Pinned-free bar, axial motion. e 
Left: NO barrier, static solution with 4 hidden layers, 128 neurons per payer. e Right: With 
barrier, dynamic solution with two periods, 2 hidden layers, 32 neurons per layer. 


plateau in SubFigs 50e-50g, demonstrating that the barrier function was the key player that effectively held 
back the static solution. 


3 Error relative to exact solution, generalization 


We provide here (1) a comparison of PINN results to the exact solution for the axial motion of a pinned- 
pinned elastic bar under a constant, uniform distributed load, (2) revealing the essentially-zero value in the 
transverse displacement and the pinned boundary conditions, and (3) a generalization of the motion in time 
beyond the space-time domain used for training. 


Compared to the exact solution, the error of the displacement time history is measured by the square- 
root of the mean squared error (MSE), abbreviated by SMSE for short, during the time interval of interest. 
The relative error (RE) is obtained by dividing the SMSE by the peak-to-peak amplitude (0.25) of the exact 
solution (0.125) at the center of the pinned-pinned elastic bar. 


Figure 52 demonstrates the limitation of the generalization—i.e., predicting the solution beyond the 
space-time domain used to train the network—that PINN could provide. It could however predict the dis- 
placement curving up to follow closely the upswing from the end of the training time domain (¢ = T = 4) 
at which the displacement and the velocity were zero as shown in SubFigure 52c until close to half a period 
beyond t = T = 4, i-e., the interval [4, 5], with the MSE = 5.47e-4 compared to the training MSE = 0.66e-4. 


Remark 3.1. PINN vs traditional FEM for generalization. Extending the time domain to obtain the nu- 
merical solution in traditional FEM using the implicit Newmark time-integration method requires additional 
discrete-system solution steps, which could significantly increase the computational cost. With PINN, there 
is no need to do optimization again, as the generalization to extend the solution beyond the training time 
domain |0, T] is by performing the low-cost prediction (function evaluation) using exactly the same neural- 
network minimizers (weights and biases) obtained during the training optimization process on the previous 
(unextended) space-time domain. No further computational-intensive optimization is required for PINN. 


The theoretically-zero transverse displacement in SubFigure 52b shows the essentially-zero value of 
the order of le-3, typical of PINN results. The zero displacement in the pinned boundary condition is not 
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(a) (27c): 1,185 params, Shape 25000 (b) (27d): 1,185 params, Free-end disp (c) Barrier effects, Shape (d) Free-end disp, Very good 


— Train loss 


—— Test loss 


0 20000 40000 60000 80000 100000 


# Steps 


(e) Barrier off at 50000 (f) No barrier, Static 100000 (g) No barrier, Free-end disp 


Figure 50: DDE-T. Pinned-free bar, static solutions, barrier functions. Appendix 2. Sub- 
Figs 50a-50b = SubFigs 27c-27d, No barrier, static solution, recalled for convenience. e Sub- 
Figs 50c-50d: Step 25,000, Shape, Free-end disp. x Barrier depth ~@ = 1, weight w = 1. 
Network: Remarks 5.3, 5.5, T=8, W=32, H=2, n_out=1, Glorot-uniform initializer, 1,185 pa- 
rameters, * regular grid. Training: Remark 5.6 (LRS 1), init_Ir=0.001. Damping%=-0.6%, 
Very good. Local maxima = {0.378, 0.379}. Time at local max = {2.4, 5.6}. > Step 50,000, 
damping%=-0.3%, Quasi-perfect free-end disp. (23105Ric) e SubFigs 50e-50g: Barrier turned 
off at Step 50,000. Static solution came back, even with random grid. (23105Ric.2) @ Figure 27, 
DDE-T static solutions. Figure 28, using our JAX script, no static solution. > Figure 38, Form 2a, GPU 
time 605 sec for 200,000 steps. Figure 45, Form 3, GPU time 537 sec for 200,000 steps. > Figure 5, 
reference solution to compare. > Figure 51, barrier-weight effects. 
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(a) w = 0.1, too small. (b) w = 0.5, better. (c) w = 0.7, better. 


Figure 51: DDE-T. Pinned-free bar, barrier weight w effects. Appendix 2. All parameters 
were the same as in Figure 50, except for the barrier weight w. e SubFig 5la: w = 0.1, 
smooth, flat pateau began to curve. e SubFig 51b: w = 0.5, waves, bigger jump after = 1. 
e SubFig 5lc: w = 0.7, waves, smaller jump after ¢ = 1, with two peaks clearly formed. e 
SubFig 50d, w = 1.0, smooth, no jump, with two distinct peaks. 


exactly satisfied, but essentially zero (i.e., of the order le-3). From this standpoint, the PINN displacement 
time history in the training interval [0,4], with its SMSE = 8e-3 and RE = 3%, approximated well the exact 
solution, with essentially-zero difference. 


Remark 3.2. Uncertainty in real boundary and initial condition. Traditional FEM can satisfy zero boundary 
and initial (BC-IC) conditions. In practice, exactly-zero BC-IC conditions are, however, an idealization, and 
thus essentially-zero BC-IC conditions could represent uncertainty in the real BC-IC conditions. a 


The zero right-end-pinned boundary condition did not generalize, i.e., remains at zero in the time in- 
terval [4,6], as shown in SubFigure 52d. SubFigure 52e shows the time history (¢ € [0,4]) of the space 
slope along the bar length % € [0,1], whereas SubFigure 52f shows the right-pinned-end space-slope time 
history (at Z = 1), which is negative in the training time interval [0, 4], indicating compressive state (as it 
should be), and is positive during generalization, corresponding to the rising curve in SubFigure 52d in the 
generalization time interval [4,6]. In general, PINN cannot generalize to have periodicity, which is longer 
than the half period [4, 5] beyond T = 4. 
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1 2 3 4 5 


of 


(c) Center disp u, generalized. 


(a) Shape w time history. 


1 2 3 4 5 6 ° 1 2 3 A 5 


(d) Right-pinned-end wu. (e) Space-slope time history. (f) Right-pinned-end slope. 


Figure 52: DDE-T. Pinned-pinned bar. Form 3.2: Step 200,000. Network: Remark 5.3, 5.5, 
T=4, W=64, H=4, n_out=10, Glorot initializer, 18,554 parameters, random grid. Training: x 
Remark 5.8 (LRS 3, NCA), init_Ir=0.01. SubFig. 52a, shape time history (t-h.). (52b) Trans- 
verse displacement t.h., essentially-zero value of order le-3. (52c) Axial displacement at bar 
center (blue), exact solution (orange), training SMSE = 8e-3, RE=3%. (52d) Right-pinned- 
end disp u (blue) cannot generalize (should be zero, orange). (52e) Bar space slope t.h. (52f) 
Right-pinned end space slope t.h. (24.1.19 Rla.1b.3) 


a 
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